Back in early Nov last year, I shipped my DIY chatbot version 0.1. At that time, I wrote, “While v0.1 represented a big first step as a coding novice, it had significant limitations.” Well, as it turned out, that is an understatement. I didn’t appreciate how clunky the entire process and my build were. The reality was, my initial attempt, though earnest, was more of a prototype cobbled together with enthusiasm but limited know-how.
This post isn’t just a follow-up; it’s a deep dive into the journey that unfolded from that point—a journey filled with trials, errors, and invaluable lessons. I’m laying bare the nuts and bolts of this adventure, not just for the sake of transparency but in the hope that my experiences, detailed as they are, might resonate with or even assist someone else on a similar path. (For additional context, I’m a middle-aged advertising professional with zero prior coding experience.)
Table of Contents
- My “valley of death”
- Incremental Wins Over Time
- Rebuilding the Foundation
- Share this with a friend
- Feb 14 update Chatbot v2.10 Unveiled
- Mar 25 update: From Frontend Upgrades to Docker Struggles and Breakthroughs
As mentioned, to ship v0.1 of my chatbot, I mainly followed the instructions from this short course “Building Systems with the ChatGPT API” and two cookbooks from OpenAI: Question answering using embeddings-based search and How to count tokens with tiktoken.
Here is why my chatbot v0.1 is terrible:
- Chunking: I simply split long blog posts into smaller chunks based on the token length aka simple static character chunk of data. The most primitive way to split possible :D. If you want to understand why this is a terrible idea, read about 5 levels of text splitting from Greg Kamradt here.
- Embedding: I struggled with embeddings. I used OpenAI’s embedding model, but kept hitting API request limits, causing the embedding process to fail halfway. I then learned to batch requests and add timeouts between batches to avoid limits. Ultimately I saved the generated embeddings into a simple .csv file as my makeshift “database”.
- Database: I knew a CSV wasn’t optimal for a database, but lacked the skills for better alternatives.
- Metadata: I initially didn’t realize including metadata like publish dates and post URLs was important for chatbots to answer user questions accurately. I had to repeat embedding and saving to incorporate relevant metadata.
- Retriever: I was unaware of different retriever types and algorithms. I simply used OpenAI’s relevance search to retrieve a hardcoded number of results.
- Memory: To have a conversation, the chatbot needs to be able to remember what was said by the user previously. And this was where with a limited context window length of gpt-3.5 (at that time), there was a clear trade off between chunk size, how many results you want to retrieve.
- For example, if your chunk size is 800 tokens and the retriever returns the top 8 results, that is 6,400 tokens or more than 50% of the old model limit.
- The above is just 1 question so you can imagine a multi-turn conversation and how the memory can be filled up very quickly.
- One way to solve this issue is to have a smaller chunk size and for the retriever to return fewer results but with a basic retriever (above), this means the model doesn’t have comprehensive information to answer the question.
- I didn’t even use any IDE at all. All of the codes were edited using TextEdit on Mac 😀 (Did I tell you that I was a noob before? :P)
- I can go on and on but I guess you get the picture.
My “valley of death”
Eager to improve beyond the limitations of v0.1, I attempted several online courses, hoping they’d provide the missing pieces to level up my skills. But progress only led to dead ends.
I struggled through exercises in vector databases (Vector Databases: from Embeddings to Applications with Weaviate) , evaluative RAG methods (Building and Evaluating Advanced RAG Applications with Llama-Index and Truera), and advanced retrieval techniques (Advanced Retrieval for AI with Chroma.) Try as I might, I couldn’t connect the theory to practical application using my own blog data.
Were the courses poorly designed? No—the shortcoming was my own lack of underlying knowledge. Still, failure after failure was massively frustrating, not to mention demoralizing. I found myself in a figurative valley, unsure how to press on.
Incremental Wins Over Time
Nuggets of value did emerge from repeated failures though:
- Adoption of VS Code over TextEdit
- Leveraging GitHub Copilot extensions
- Appreciating Jupyter Notebook for development environments
The last course mentioned LangChain, a popular new framework for building chatbots. I actually attempted LangChain tutorials months ago (“LangChain: Chat with Your Data” and “Functions, Tools and Agents with LangChain”) without much luck. But now, with hard-won knowledge under my belt, revisiting its docs proved illuminating. Concepts clicked together and its modular architecture made intuitive sense.
I could envision adapting LangChain’s robust capabilities to my passion project. Finally, a way forward revealed itself! One step at a time, I oriented myself with its pipelines for data ingestion, embedding, storage and retrieval.
My confidence grew with each piece I managed to implement. V2 began taking shape…
Rebuilding the Foundation
With Langchain as my guide, I set out to reconstruct my chatbot from the ground up:
Ingesting WordPress Exports into JSON
I created a custom script to transform my WordPress export into consumable JSON documents for LangChain. The script splits content into individual post files as well, maintaining organization. (The file name is: “xml_to_json_html.py” and it can be found here.)
After some trial and error tweaking ingestion parameters, LangChain’s JSONLoader properly parsed my exported posts. Now validated input could fuel downstream pipelines. (The code for this step is in the file “DataIngestionAndIndexing.ipynb” in this public Github repo.)
Automated Text Splitting
My naive token length chunking was replaced by LangChain’s SentenceTransformers, using advanced NLP to split semantic units. No more disjointed sentences cut midway! Configurations kept chunks appropriately sized for memory-constrained models.
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
# Define the token splitter with specific configurations
token_splitter = SentenceTransformersTokenTextSplitter(
chunk_overlap=0, # Overlap between chunks
tokens_per_chunk=256 # Number of tokens per chunk
)
# Split the documents into chunks based on tokens
all_splits = token_splitter.split_documents(documents)
print(f"Total document splits: {len(all_splits)}")
Generating Embeddings and Indexes
Past struggles with OpenAI API limits vanished using LangChain’s wrapper for OpenAI embeddings. Bundled into two lines of code, embeddings cleanly extracted salient features from split text.
For vector store, I opted for FAISS (Facebook AI Similarity Search) over Weaviate or Chroma. Industry-tested FAISS struck the right balance of capability versus complexity for my needs. Its CPU version rapidly indexed blog chunks, outputting a compact searchable database. I don’t need to worry about batching or hitting the API request to OpenAI anymore.
Langchain supports multiple vector stores so you can check them out here.
# Initialize embeddings and FAISS vector store
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(all_splits, embeddings)
# Save the vector store locally
db.save_local("path/to/save/faiss_index") # Placeholder for save path/ index name
Two lines of code! That is it.
What I also noticed is that if your content is not small ( I have 400+ blog posts), then you should not try to use the retriever immediately after the embedding because the indexes need a bit of time to be completed/to become stable.
Evaluating Retrievers
The more I spend time building the chatbot, the more I appreciate how important retriever is. Getting usable context is integral to answer quality. I evaluated multiple built-in retrievers (similar score with top_k, multi query retriever, contextual compression, parent document retriever) by manually assessing responses to test queries. The “Max Marginal Relevance” algorithm emerged as the leader, balancing diversity, speed, relevance and cost.
I had to do this manually because I still couldn’t make trulens eval to work yet. I keep running into an error with an import tru_llama.py.
Setting up this retriever takes just 1 line of code :D, using FAISS as vector store.
retriever = db.as_retriever(search_type="mmr")
With my index and retriever locked in, I had data pipelines ready to fuel an intelligent chatbot!
Architecting Conversational Agents
I decided to use the agent framework from langchain to build this chatbot. Is it overkill at this point? Yes it is. But my hope is that overtime, I can evolve this chatbot and give it more “tools” aka functionalities. Langchain makes it super easy to set up the agent and give it tools to use.
embeddings = OpenAIEmbeddings()
db = FAISS.load_local("path/to/your/faiss_index_file", embeddings) # Replace the path with your actual FAISS index file path
retriever = db.as_retriever(search_type="mmr")
tool = create_retriever_tool(
retriever,
"search_your_blog", # Replace "search_your_blog" with a descriptive name for your tool
"Your tool description here" # Provide a brief description of what your tool does
)
tools = [tool]
prompt_template = ChatPromptTemplate.from_messages([
# Customize the prompt template according to your chatbot's persona and requirements
])
llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0)
llm_with_tools = llm.bind_tools(tools)
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
"chat_history": lambda x: x["chat_history"],
}
| prompt_template
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
The agent has chat_history as a way to manage memory.
The full, final python code for the agent
You can find the code in the Github Repo here. The file name is: example_app.py. You can see some basic security measures implemented in the code. So PLEASE don’t try to hack me. I am a newbie. Leave constructive feedback or code suggestions for me on Github.
Last but not least, if you want to try the chatbot v2, here it is.
Is it weird that the chatbot doesn’t know anything about you Chandler?
P.S: Thank you to some of you who have reached out to let me know that the chatbot doesn’t know anything about me. And. you are right! It is because I forgot to export the “About” Page and only exported the “published posts”. This is the second time I forgot to do this so I will include basic questions about me in the list of eval questions. Lesson learned!
Thank you everyone for sharing your constructive feedback. Please keep them coming. And yes I know the chatbot is very slow to start so I am working on that too. 😐 (Did I tell you that I am a noob before? :P)
A quick update
The issue with the chatbot not knowing anything about me is now fixed. This is what I did and learned:
- Export the “About me” page from WordPress to .XML per the above.
- Ran “xml_to_json_html.py”.
- Performed text splitting and generated embeddings using FAISS as above. Saved the vector store as a different name locally to test the retriever
# save the vector store to local machine
db.save_local("faiss_index_about")
# set up the retriever to test the new vector store about Chandler
from langchain.retrievers.multi_query import MultiQueryRetriever
llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
retriever=db.as_retriever(), llm=llm
)
# test retriever
question = "Who is Chandler Nguyen?"
results = retriever_from_llm.get_relevant_documents(query=question, top_k=8)
for doc in results:
print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
- As it turned out the process to merge two FAISS vector stores is surprisingly simple, per the documentation here.
# Try to merge two FAISS vector stores into 1
# load the vector store from local machine
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
db_1 = FAISS.load_local("faiss_index_about", embeddings)
db_2 = FAISS.load_local("faiss_index", embeddings)
db_2.merge_from(db_1)
# save the vector store to local machine
db_2.save_local("faiss_index_v2")
# test the new vector store to confirm correct retrieved documents
retriever = db_2.as_retriever(search_type="mmr")
results = retriever.get_relevant_documents("Who is Chandler Nguyen?")
for doc in results:
print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
- After that, it is pretty much the same process as above, using the new vector store
Share this with a friend
If you enjoyed this article and found it valuable, I’d greatly appreciate it if you could share it with your friends or anyone else who might be interested in this topic. Simply send them the link to this post, or share it on your favorite social media platforms. Your support helps me reach more readers and continue providing valuable content.
Feb 14 update Chatbot v2.10 Unveiled
Two weeks after this deployment of the chatbot, I introduced version 2.10 which elevates User Experience with Enhanced Speed, Scalability, and Simplicity. You can read more about it here.
Mar 25 update: From Frontend Upgrades to Docker Struggles and Breakthroughs
You can read more about it here.