I Cheated: Sydney Can Now Read the Narrative in 10-K Reports

Roughly a month after launching Sydney’s MVP version, I’m excited to share that Sydney can now dive deep into the Written Content of annual reports (10-K) of the “Magnificent 7” tech companies from the past decade! (That’s Apple, Amazon, Alphabet, Facebook/Meta, Microsoft, Nvidia and Tesla.)
Before, Sydney was limited to answering only on financial facts and numbers for the entire S&P 500, but it couldn’t interpret the narrative sections of the reports. This upgrade took some strategizing, balancing chunk size, embedding dimensions, and the sheer volume of reports included in the vector store.
So, yes, I did “cheat” a little in this version :P.

Here’s how I made it work:

  1. Focused Scope: Instead of covering all 500 S&P 500 companies, I added just seven key players to the vector store.
  2. 10K Reports Only: I included only annual reports (10Ks) over the last 10 years, skipping quarterly (10Q) reports. The result? Nearly 700,000 data objects in the vector store. If I added quarterly reports, the number would skyrocket (and monthly cost will increase accordingly).
  3. Text embedding: I settled on “text-embedding-3-small” model from OpenAI, with a 512-dimension.
    • Why not use “text-embedding-3-large”? The cost difference is over tenfold! And the hybrid search quality seems good enough with the current setup.
    • Why not go for 1024 or 1536 dimensions? Again, costs come into play. 512 dimensions keep monthly vector store expenses reasonable.
  4. A New Tool Just for This Task: I equipped Sydney with a tool specific to the narrative content of these seven companies’ annual reports. So if you want hard numbers for the entire S&P 500, you can still get them through a separate tool.

With the new tool, Sydney can now answer questions like:

  • “What did Nvidia discuss about their chip architecture last year?”
  • “Did Apple identify any key iPhone competitors in 2022?”
  • “How has Microsoft described Azure competition over the past 5 years?”

All answers are grounded in content directly from the 10K reports themselves.

So give it a try and let me know what you think?

Chandler

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.