Last week, I quickly deployed a chatbot on my blog using Google Gen App Builder. While I love how quick and easy the entire process is (and the free credit (love) ), the chatbot has a few areas of improvement and the biggest one for me is how to get the chatbot to “synthesize” the content across multiple posts vs. simply matching the user query with past content. I am not sure if this is a realistic expectation but well, I want to give it a try.
And here is what I have learned after playing with it for a bit:
1. Adding your content as unstructured data into the bot does seem to help
What do I mean by this? Well, besides having Google’s crawler index the live website, you can also add all of your content into the bot, using “Data store.”
After the data store is created, you can add the new store into the chatbot, under Agent settings
After doing this, I have found that the chatbot’s answers are much better. It seems to “know” about the content a lot more.
Google provides the guide here, under “Unstructured data store” and “Upload with metadata”.
ehhh but how do I convert my 450+ blog posts into the required format, including a JSON Line file?
2. chatGPT to help with data cleaning and preparation
I am not a technical person so all I could do was export my blog content from WordPress to .XML file. I had to rely on chatGPT to help me with code to clean and prepare the data in the required format.
I like chatGPT in this regard because with “custom instructions”, chatGPT has some basic understanding of my situation and can provide a very detailed step-by-step guide.
The first time I tried to work with chatGPT to convert .XML file to .HTML and .Json Line format, this was what I wrote “The blog uses wordpress. I can export all published posts from this blog using WordPress. I need to prepare the data so that it can be used to train a large language model. What should I do to prepare this data?”
After following through all the steps, with relevant scripts provided by chatGPT and uploading the data to Gen App Builder, I came across many errors. Basically, the data is not in the format that Google expected so it can ingest/integrate the data.
So this is where I have learned a valuable lesson. I should have started by providing the entire Google Gen App Builder documentation guide to chatGPT.
Provide chatGPT with the Actual Documentation guide
I simply “copy and paste” the entire documentation guide from Google Cloud to chatGPT and ask it to write Python code for me so that I can convert the data from .XML to the required .HTML and .JSON line format. This time, because chatGPT understands the final format and template, the code it generates works much better without many errors when uploading.
3. General knowledge of Python does help
I am very much a newbie when it comes to Python so I have to rely on chatGPT for most of the actual coding. However, general knowledge of Python will help tremendously because you know what to ask ChatGPT to do. It is super powerful but it doesn’t know what you don’t know and your development environment.
For example, the Python code generated by chatGPT often misses out on the “shebang” line. Because I know about it, I often ask chatGPT to include that line in the code. Also when chatGPT asks you to do something using the command line, you have a rough idea of why too.
4. Oct 2023 update
Since I published this post, I managed to build my own chatbot, using OpenAI API. The chatbot allows you to interact with all of the historical content on my blog until the end of Sep 2023. The major advantage it has is that it can synthesize content across multiple posts on the same topic, which is my biggest issue with an off-the-shelf solution. You can check out the chatbot directly here or check out my post about “How I Built My Own Chatbot with No Coding Experience: Lessons Learned.”
That’s it from me. Let me know what you think in the comment section below. Have you tried building a chatbot with Gen AI yet? And give my chatbot a try and tell me what you think. (it can be found at the bottom right-hand corner.)