In my previous posts, I shared how we've been using Natural Language Processing (NLP) to enhance user navigation within our application. Today, I'm thrilled to take you a step further into our journey.
We're now leveraging NLP not just for navigation but also for executing complex commands within the application. This development goes beyond basic keyword matching. Now, user inputs, in any language supported by ChatGPT, are intelligently interpreted to understand the underlying intent. This means our system can now discern the specific command a user intends, whether it's creating new records or navigating through multiple layers of the application.
This advancement is a game-changer in how we interact with our technology, making it more intuitive and responsive than ever.
Mapping Commands to their Context
We're taking user experience to the next level. Initially, we used user input to navigate to specific screens or layouts based on a set of Prompt Navigation records. Once on the desired layout, a second API call to ChatGPT helps map the user input to the correct command.
Using natural language input, users can now bypass traditional approaches like sifting through documentation, help sections, or tutorials. Instead, they can rely on intuitive, natural language commands to navigate and use the application effectively.
To make this possible, we've created 31 ‘navigation’ prompts, ensuring users can seamlessly switch contexts to the right screen or layout. Furthermore, I developed 53 ‘command’ prompts, essentially teaching ChatGPT how to use our application.
This innovation is all about simplifying and enhancing the user experience, making our application more intuitive and accessible than ever.
Create a new Alert. 새 알림을 만듭니다. (Korean) Crea una nueva alerta. (Spanish) Créez une nouvelle alerte. (French) 만들다 nouvelle アラート (Korean/French/Japanese) This is nonsense. Show the Tutorial Please do a backup View my Daily Word Count. Show me the story “Pompelier” Change the Language Overrides. Edit the story “When Worlds Meet” Please edit my story with the title of “Worlds Meet”
Sample User Inputs
After implementing this new framework, I discovered its remarkable extensibility and tunability. Initially, as I experimented with various inputs, I noticed that the system would occasionally navigate to the wrong screen or execute an incorrect command. However, by refining the prompts, I've been able to cleanly differentiate between different locations and commands within the application. This precision has significantly enhanced the user experience.
It's important to note that while our system covers a broad range of commands, it's not designed to replace every possible command with a Command Line Interface (CLI) equivalent. Instead, my approach is to guide users to the right starting point in the application, allowing them to complete operations using the standard user interface.
So, why retrofit a generative AI onto an existing application? Imagine asking ChatGPT/DALL-E to create a drawing from your textual description. The drawing represents the output. Similarly, this integration allows the application to interpret the textual input and output its interpretation on the canvas of the application. The result is a more intuitive and flexible application that requires less user training and is more intuitive to use.
Behind the Scenes
I wrote the function 'Get Command by LayoutID' to implement the running of commands in the application. This addressed several areas:
First, it enabled us to focus on a subset of the total commands available, making our system more efficient and streamlined. This targeted approach not only keeps our token usage to a minimum but also significantly boosts performance.
Moreover, this function allows for enhanced differentiation between commands. By mirroring the commands available on the screens within a specific context, we've made the user experience more intuitive and aligned with the application's layout.
This software application has the following Commands that may be applied within this context. Each Command is delimited by |-|. The CmdID is delimited by |X|. The Description is delimited by |+|. [List of Commands for a Layout per the $CommandBlock taken from the database] I want to determine the command for this context in my software application. Give me the CmdID that best matches the following input phrase: [User input]
Prompts for ChatGPT to map the User Input to the commands
The code for the above prompt is shown in the function below in lines 33 to 37, which in turn is what I use to call the ChatGPT API.
Function ‘Get Command by LayoutID’
See lines 33 to 37
And here is the screen where I manage the Prompts for the prompt administrator.
Prompt Definition screen
Lessons Learned
Initially, I considered a keyword-based approach for implementing this capability. However, I quickly realized that such a method would be fragile, challenging to maintain, and not scalable. This is particularly true for users who would benefit the most from NLP (Natural Language Processing); that is, matching their diverse vocabulary would be a daunting task for ChatGPT or for NLP in general? The solution? Leveraging the power of Large Language Models (LLMs) for more effective matching.
One intriguing aspect of using NLP is the complexity of testing. Crafting a set of tests that encompasses the vast range of potential inputs is a formidable challenge. This is where capturing each LLM interaction and encouraging user feedback becomes crucial. This feedback allows us to refine our application in real-time, adjusting prompts without altering the core code. However, caution is key – changes to prompts must be carefully managed to avoid disrupting other functionalities.
Reflecting on my career, the dream of solving this problem seemed distant until now. The integration of a robust NLP front-end and the ability to map NLP outputs to specific commands has been a game-changer. While this required adding a new layer of code, the user interface remains clean and easily adaptable.
Looking ahead, I envision a future where the traditional "application" layer might be replaced, allowing direct conversation with the AI to execute business processes. Yet, capturing structured data to support these processes remains vital. My ultimate goal is to feed the AI with a comprehensive XML definition of every screen, database schema, and function. This, however, is a significant challenge, requiring extensive token usage and training on numerous fully-debugged applications.
Stay tuned for my next post, where I'll delve into how ChatGPT can handle two common human-centric tasks in our application: writing a query to a publisher, and managing the publisher's response (S1:E5). For more insights and updates, keep following me here on fmsoup!