I’ve been debating whether to even write this update about my coding journey. The progress hasn’t been flashy or externally visible—so why bother?
Well, maybe this post will serve as a good reminder for my future self that progress can slow at times and growth often happens in spurts.
Since my last post about building a self-evaluating chatbot in mid-April, I’ve dived into a few courses:
- React Basics by Meta on Coursera
- React Advanced by Meta on Coursera
- Django web framework
- The Full Stack
Why These Courses?
I want to share my personal experiences and thought process behind choosing these courses. My hope is that it may help those of you who are (or will be) on a similar journey.
Becoming Better at Instructing the Machine
It has been about 10 months since my dabble into coding, while having a full time job. (You can read more about How I Built My Own Chatbot with No Coding Experience: Lessons Learned published Oct 2023 here.)
I definitely could NOT make it without the help of Generative AI (whether it is chatGPT or Anthropic Claude or Microsoft CoPilot in Visual Studio Code.) However, the more I code, the more I realize that while these foundational models are powerful, they’re still like interns. They’re not yet good at planning or solving ambiguous tasks. The more specific our instructions, the better the output.
That means I need to become increasingly specific with my coding instructions. But how? So far, I’ve been describing what I want in natural language. While this works at a basic level, it’s inadequate for more complex tasks.
Then the next logical question is what to learn?
Focus on the Business/User Problems I’m Trying to Solve
My chatbot (code-named Sydney) for this website was (perhaps “is”) slow. The slowness happens both in:
- Start up time
- And during the conversation
Streaming isn’t enabled, so users have to wait a relatively long time to see the complete answer.
So my goal was (is) simple: make it fast and responsive.
My goal is simple: make it fast and responsive. But without more foundational knowledge, it’s challenging to break this down into manageable problems to solve. That’s why I decided to learn more about scalable, secure, and fast front-end frameworks, back-end frameworks, and the full stack (including databases).
My experiences with React and Django so far
- React: React is relatively easier to learn and deploy compared to Django. (Duh!) The current chatbot front end uses React, and I’ve found that Claude and ChatGPT can handle most of the coding tasks without issues. Deployment in a production environment with React was quite simple too, and the chatbot page seems more responsive during conversations.
- Django: Django and Django Rest Framework (DRF) had a steeper learning curve, particularly with deployment involving a Cloud SQL database and ensuring security. Django’s comprehensive nature means that deployment with the right level of security took some time. However, I still CAN NOT get streaming HTML answers to work with DRF in a production environment! While I could make it work using WebSockets, that approach deviates significantly from DRF, so I haven’t adopted it yet.
So if you know a framework or answer on how to make Langgraph streaming works with a simple front end, let me know!
What About the Financial Chatbot I Mentioned Back in April?
The good news is that this project has piqued the interest of many of you—I’ve received numerous questions and comments. The bad news? I’m nowhere near completing it. Here is why:
1. Balancing retrieval quality, vector store monthly cost, and embedding cost
As mentioned in the Apr post, I am evaluating Weaviate’s Serverless Cloud Vector Store. It does cost real money though. For this project, I’m likely looking at:
- Data Volume: More than 10 million, possibly 12.5 – 15 million data objects.
- Cost Estimate: With a vector dimension of 512, the monthly cost could range from $600 to $750.
This amount is manageable for a company in a production environment but might be too steep for a side project like this. So, I need to find ways to lower the cost.
Based on how Weaviate pricing works, the two levers are: reducing the number of data objects or reducing the vector dimension. Both of these come with different tradeoffs.
Ways to reduce data objects
- Aggressive Text Cleaning: Reducing the overall text size in the 10K and 10Q filings could help, considering we’re dealing with 500 companies in the S&P 500, each with multiple filings over the past 10 years. I think I have exhausted this option.
- Increasing Chunk Size: By increasing the chunk size from 256, to 512 to 1,000 to 2,000 or 3,000, I can reduce the number of data objects by 2x, 4x or 10x. However, this may lower retrieval quality, as larger chunks tend to be less precise.
Ways to reduce vector dimension
Reducing vector dimension seems directly correlated with retrieval quality—the lower the dimension, the worse the retrieval quality. I’m still experimenting with different embedding models like OpenAI’s “text-embedding-3-small” and “text-embedding-3-large” with various dimensions. If you have any good recommendations, please share!
Using the “text-embedding-3-large” model from OpenAI also adds to the cost during the embedding phase. For example, it cost me $4.5 to generate embeddings for 23 companies’ 10K/10Q filings over the past 10 years with a vector dimension of 1024. This means it could cost around $100+ for the entire S&P 500. (Yes, I’m a bit of a cheapskate. Do you not know that before 😛 ?)
2. Latency during query translation, structuring, retrieval
The more elaborate the query translation and structuring, the better the search terms / Content search terms we can generate, leading to higher-quality retrieval. However, this also increases latency.
- Query translation is the step to we ask the model to translate the initial user question into sub questions that are better for retrieval. For example, with the original question of “How did Airbnb’s operating margin change from 2020 to 2022?” the model can generate sub questions like:
[
"What was Airbnb's operating margin in the 10-K filing for the year 2020?",
"What was Airbnb's operating margin in the 10-K filing for the year 2021?",
"What was Airbnb's operating margin in the 10-K filing for the year 2022?"
]
- Query structuring is the step that we ask the model to turn these questions into search query or content search query for retrieval, together with relevant filters. Using the above example, for question 1, the model may return
{
"content_search": "What was Airbnb's operating margin in the 10-K filing for the year 2021?",
"company_conformed_name": "Airbnb, Inc.",
"form_type": "10-K",
"conformed_period_of_report": "2021-12-31"
}
For retrieval, the more results we return, the longer it will take for the model to process. Returning fewer results means faster response times but potentially lower quality answers due to a lack of context.
Wrapping it up
There’s more I’m tackling, but since this post is already longer than expected, I’ll wrap it up here. As always, feel free to leave your questions or suggestions below.
Chandler