Generative AI Application Retrofit
Movie for S1:E1 in LinkedIn
Welcome to my series of posts chronicling my journey with the ChatGPT API. As a Product Manager, I'm excited to share real-world applications of Generative AI. To that end , I will be using an application I wrote in FileMaker which employs a scripting language, and packages up my API calls in JSON. At the end is also a movie I made on this topic, showing everything in this post, in action. So, without further ado, I’ll get started.
I find ChatGPT to be a bit like magic, and also a bit sloppy in terms of the response from the API, despite efforts to force it into a more controlled output to make it easier to use. My application, The Writer’s Scribe (TWS), takes stories or articles, and creates a submission to one or more publishers to track the status of your submissions. While the premise is simple, complexity arises in managing rules around multiple and simultaneous submissions, publisher availability, genre preferences, and submission timelines.
To retrofit generative AI to my TWS application, I first looked at the types of problems that would benefit the most from using ChatGPT, and that were easy to do. I already had a function to summarize the source text for an article, and I already had a synopsis field. Many publishers require a synopsis to be included in the cover letter for a submission, so TWS has a template that generates a query letter that includes the synopsis. My goal was to have ChatGPT create a summary that could be used as the synopsis.
The ChatGPT API call works well when I assign a limit of 100 words to the summary and use it on articles under roughly 2K words. Since I have a number of short stories, this call works well for this purpose. Below is a screenshot where I initiate the summary, and where the summary is displayed in the synopsis field.
Analyze Source for Synopsis
In this case, I did not make it apparent that I generated the synopsis via AI as it is simply a synopsis and I expect the user will review the synopsis as needed. If it fails, the synopsis is simply left empty. I set the ChatGPT temperature to zero, so hallucinations are not a concern since this reduces the level of creative variations. My takeaway for this feature is that it is important to select the appropriate model for the API call for handling the input. In this case, I used “gpt-3.5-turbo-16k”. I did this using a paid account for the API calls, and over a period of several days, the cost was under $2 for my testing and generation of several summaries. When creating summaries, my message or input will be larger than my output, so in the future, I may adjust the call since my input/output is asymmetric, and I want to minimize my token use.
ChatGPT Bring Your Own License Setup
My second retrofit feature was more ambitious. I wanted to add a Command Line Interface (CLI) to the product that would be available throughout the application. This CLI would allow the user to navigate to the appropriate screen, and where possible, to the specific record. It is purely navigational in nature, so it is relatively safe to execute, and if it fails, it tells the user why it failed using the API response. In the future, I would like to expand the CLI to perform queries, commands, and other higher-level processes.
Shown below is one of the screens where I added a button to the top of the screen to show a popover for the CLI. This button is placed at the same position on every screen with the same popover for consistency. To use this interface, the user clicks the button, enters the free-form text for their input, and then they hit “Enter” or the Send button.
AI Popover to Chat for Navigation
This is where the magic comes in using generative AI. To perform simple navigation, I need to provide the API with some context to interpret the input against. I basically need to describe each screen in the application, what it does, and what kind of objects it operates on. In this case, I had already implemented a simple Help popover on each screen that described what that screen does, so I used that to create a prompt that associated each screen with a description. I then asked ChatGPT to determine the best fit to map the user’s input against the available destinations. In this case, the user input was: “Edit my story ‘When Worlds Meet’.”
Story Record with AI Popover with User Feedback
I found that this would be far too much information to expose to the user regarding the full context used for the prompt. So I created what I called a synthetic prompt where I took the context generated from the screen descriptions and screen IDs, and then appended the user input to the end of it. In general, this worked well.
The question remained, does this approach deliver the value my users expect? Since this is not actually adding any new capabilities and only addresses the User Experience, it is adding only limited value. Still, users that eschew clicking and want to simply ask the application to do what they need will benefit. This is especially true of new users or those who use the application on an infrequent basis. So, for this group, it is a valuable feature.
Next, does this feature work? I found this is a more complex problem to solve. I have no control over what the user enters, and generally I have a problem of “garbage in, garbage out”. To resolve this, I decided to add the ability to log all of the calls and allow the user to enter favorable or unfavorable feedback on each call, along with a comment. I anticipate a need to refine the ChatGPT prompts, and this type of input will help because it gives me real world examples.
I found that simply navigating to the correct tab is not a real win in the application because, after all, the user could simply click on a tab and navigate to a given record quickly. So I expanded the navigation to include not only the correct screen, but also the correct record. I found that ChatGPT is able to extract from the user input whether it included a Personal Name or a Proper Name (e.g. a person’s name versus the name of a story). Given that, based upon the navigation target, the application could then perform a find using that name to move to the record or records that match. In general, my ChatGPT calls about 3 seconds after I refined the context. So this approach is a win as it supports a chat-like interface so the user does not need to know the structure of the information.
To conclude today’s post, here is the interface for the log entries I captured for each of these CLI interactions. This gives me good user feedback so I know where to focus my efforts to improve the prompts. It also provides performance information, which is always important for a good UX.
Log of Generative AI User Prompts
Thank you for reading this post. I hope it provides a better appreciation of the problems you can solve in an application retrofit. See below for a short movie that illustrates the above. Next time, I’ll delve into my work to refine the CLI as I expand it to cover commands, in which I do some much more exciting, and dangerous, enhancements!
Be sure to FOLLOW me for my next update (S1:E2).