S&P500 Agent MVP Launched: Answering Financial Questions Grounded with SEC Data

Hi there,

I’m happy to share that since my last post about my struggle working on the S&P500 Agent, the MVP version is ready! Let me walk you through what this minimum viable product can do and how it came to be.

What Can This MVP Do?

Dive into a Decade of Data: The agent’s database includes the last 10 years of company facts submitted to SEC EDGAR. Pretty neat, right?
Reliable Answers: Because it’s grounded using actual facts submitted to the SEC, you can trust the quality of the answers.
Fact-Check Friendly: The agent always includes reference data in its final answer. So if you’re feeling skeptical, you can check it yourself!
Handles Tricky Questions: It can tackle semi-complex queries like “Compare the revenue of Apple and Microsoft between 2020 – 2022?” or “How did Microsoft’s operating margin change from 2020 to 2022?” Why are these semi-complex? Well, the agent needs to “reason” and break down these broad questions into smaller ones, look up info from the database, and then piece it all together.
- For example, to answer the first question, the agent needs to know individual company revenue for each year 2020, 2021 and 2022 and then do the comparison.
Up-to-Date (Almost): The cut-off date is August 2024. Anything submitted to the SEC after that isn’t in this MVP version.
HTML streaming: Streaming works! Yay! As it turned out, DRF and React natively support streaming, so with langgraph, we’ve got a responsive conversation flow, displayed in easy to read format. I have struggled with html streaming for a while now.

A Peek Under the Hood

If you’re still reading at this point, you probably want to know more about how I built this MVP and overcame some of the challenges I mentioned before. So, let’s get a bit technical!

Slimmed Down Data: Instead of using full 10-K or 10-Q reports, I’m using the “Facts” submitted to the SEC. This means our database is significantly smaller – less than 2GB! Below are some examples of what were submitted from public companies to the SEC:

  "AccruedRoyaltiesCurrent": {
    "label": "Accrued Royalties, Current"
  },
  "AccumulatedDepreciationDepletionAndAmortizationPropertyPlantAndEquipment": {
    "label": "Accumulated Depreciation, Depletion and Amortization, Property, Plant, and Equipment"
  },
  "AccumulatedOtherComprehensiveIncomeLossNetOfTax": {
    "label": "Accumulated Other Comprehensive Income (Loss), Net of Tax"
  },
  "AccumulatedOtherComprehensiveIncomeLossOtherThanTemporaryImpairmentNotCreditLossNetOfTaxAvailableforsaleDebtSecurities": {
    "label": "Accumulated Other Comprehensive Income (Loss), Other than Temporary Impairment, Not Credit Loss, Net of Tax, Available-for-sale, Debt Securities"
  },
  "AdditionalPaidInCapitalCommonStock": {
    "label": "Additional Paid in Capital, Common Stock"

This also means that I don’t need to use a large scale, fast vector store, which is very expensive (in the range of $600 – $700/month)

2. Cloud SQL postgreSQL as the main database: I’m using Cloud SQL PostgreSQL as the main database. Since I was already using Cloud Run CI/CD from GCP, it made sense to stick with Google’s offerings. This is a brand new concept to me so I had to learn how to do database migration from local to the cloud and then how to configure the backend to work with the Cloud SQL database using private IP. The documentation from GCP has been helpful.

3. React Meets Django: This is my first rodeo successfully deploying both React frontend and Django Rest Framework (DRF) backend with a Cloud SQL database. It’s been a journey of trial and error, but ChatGPT’s o1-preview and Anthropic Claude 3.5 Sonnet have been a huge help. They are indeed very powerful, especially when you give it enough context of the problem you are solving.

4. Smart Agent: The agent is built using Langgraph. It’s got two main tools: Google Search and a Financial Question Answering tool. It decides which to use based on your question.

5. How SQL queries are generated?

It is not straight forward to convert a generic/broad user question into suitable SQL queries, fit for the database. One of the main reason is because when companies submit the financial facts to the SEC, they use financial concepts/terminologies that are not intuitive to normal people.

For example, “revenue” can be represented by multiple facts like:

Revenue
Revenue from Contract with Customer, Excluding Assessed Tax
Revenue from related Parties
Deferred Revenue
etc…

And across companies, they can choose to use different labels for the same concept. For example, Apple uses “Revenue from Contract with Customer, Excluding Assessed Tax” vs Tesla uses “Revenue”.

Or when you ask for “operating margin”, that may not be a fact that company submits to the SEC. Instead, they submit total revenue and Operating Income (Loss).

Another situation is when companies changed their name over time so you need to ensure that the SQL queries include both the old name and the new name.

So how do I instruct the machine to select the right fact/label for a particular broad question?

I use hybrid search with Weaviate to find the most relevant facts or labels and send these back to the LLM model.

6. Break Down Testing is Key

Before the final deployment to production, I broke the process down into steps and tested each one:

1. Local test of the DRF backend with a local database.

2. Docker container backend test with local database.

3. Migrating the PostgreSQL database to Cloud SQL and testing it.

4. Docker container backend test with Cloud SQL.

5. React front end local test.

6. React front end with production backend and Cloud SQL.

7. Migrating React front end to WordPress in production.

8. End-to-end testing.

I had to learn this step-by-step approach the hard way. Initially, I skipped some steps and tested in bigger chunks, but the language models (LLM) couldn’t pinpoint exactly where things went wrong.

What’s next?

Fix the cold start problem: Right now, it takes a bit of time for the agent to answer the first question. I’m working on a cost-effective solution.
Show interim steps to users: While answering complex questions like “Compare the revenue of Apple and Microsoft between 2020 and 2022,” it takes time for the agent to generate the final answer. Showing the steps the agent is working through in real-time might improve user experience.
Enrich with 10-K/10-Q text content: I plan to include more relevant textual content from 10-K and 10-Q filings to provide more detailed answers.

That’s it from me for now. Give the MVP version a try and let me know what you think below or email me directly (chandler@chandlernguyen.com)

What Can This MVP Do?

A Peek Under the Hood

What’s next?

Leave a Reply Cancel reply