Tuesday, July 22, 2025

LLMs and Investment Research Agents

 In the past month, I’ve spent a lot of time focused on raw data (see: SEC filings, reading PDFs), and so it’s a good time to zoom out and look at the bigger picture. I think there’s a few big questions:

  1. What can LLMs do that couldn’t be done before?
  2. Where can LLMs provide real value in the investment world?

How humans glean insights from data

Almost any research project can be generalized into a few steps:

  1. Data gathering – find trusted sources of data
  2. Data extraction – process the data from the source documents
  3. Data storage – save the data somewhere, in case we need it again
  4. Data analysis – slice and dice the data until you find something interesting
  5. Conclusion – report on the results

For most things, we don’t even think about these being individual steps. For example, if you’re researching a public company for a personal portfolio, “data gathering” is Googling, “data extraction” os reading a few articles, and “data analysis” is simply “thinking.” Another example: if you’re deciding on what car to buy, “data gathering” may mean finding the articles (e.g. Consumer Report) or online forums (e.g. Reddit) that you trust the most, and “data analysis” means weighting the trustworthiness of each data source in your own head. A broader example: a city policy maker may “gather data” by through city statistics but also schmoozing with city business leaders, “data analysis” is a complex synthesis of the two, and the “conclusion” might be a new policy.

Why bother breaking these steps out? While intuitive at a small scale, these steps become important as you scale up. Most tech companies now have separate teams for data extraction/storage (“data engineering”) and data analysis (“data analyst”). Similarly, as we look to delegate tasks to robotic “agents,” we can’t just tell them: “Do this complex task.” Software developers have to break down complex problems into bite-sized components, closely mirroring the way that we humans think. If (and when) the software fails, we can take a closer look at each step: why did this fail? And was the approach we took here the right one for the job?

Additive value of LLMs in the research process

I keep finding myself coming back to the fundamental question: what changed with LLMs? What problems do they solve that couldn’t be done before?

In my view, the primary differentiating benefit of LLMs is the ability to (a) interact with the computer using truly natural language and (b) extract and analyze qualitative data (i.e. text, images, etc.). ChatGPT made such a big splash because of its ability to string together words into “fluent” English, but we’re coming to realize that its true value comes from its ability to process text.

It helps to view LLMs through the lens of the larger research process above. LLMs are excellent at “data extraction” (ingesting text) and good at some aspects of “data analysis” (regurgitating text). For example, if you have a list of 50 articles, an LLM is great at “reading” them all and summarizing the key points. Likewise, if you have a public company’s 10-K or a long legal document, an LLM should be great at pulling out details from it. They are getting better at “data gathering,” too; for complex or ambiguous questions, ChatGPT outperforms an old-fashioned Google search.

However, I believe humans still hold the edge in the other aspects of the research process (for now). Experts know where the best data is, how much to trust each source, and how to tie the pieces together. LLMs can – and should! – help with each step of the process, but in an inherently unpredictable, non-deterministic world, we’ll continue to rely on humans to make the consequential final decisions (“data analysis” and “conclusion”).

Investment research agents

As with LLMs, I find myself wondering: is there anything new with the advent of “AI agents” or is the term just a brilliant stroke of VC marketing? At their core, all agents do is add (a) automation and (b) action-oriented APIs, creating a sequence of steps that creates a mirage of computerized life. I don’t think there’s any truly novel technology here, though; instead, AI agents’ emergence is about ease, scale, and ubiquity. It’s become almost easy for companies to adopt useful “AI agents” that provide tangible value (by building it themselves or hiring a start-up).

Likewise, it’s easy to see how agents could add immediate value in an investment office. For example:

  • Company analysis – given primary sources (10-Ks, 10-Qs, earnings calls, etc.), extract key metrics and an understanding of the business
  • Financial data extraction from audited reports – given quarterly updates or audited financials, extract the text and data into an internal database
  • Investment fund/founder due diligence – given the reams of public data about a person or investment fund (e.g. podcasts), build a one-pager
  • Investment pitch deck analysis – given an inbound pitch deck, partially fill out an investment framework (leaving gaps for where we need still to ask questions)

These types of agents are starting to gain traction. LlamaIndex, one of the leading open-source agentic frameworks, advertises “faster data extraction, enhanced accuracy and consistency, and actionable insights” for finance workflows. One of its testimonials comes from Carlyle, who built out an LBO agent to “automatically fill structured from unstructured 10-Ks and earnings decks.”

A 5-year plan for LLM-infused investment offices

I recently read about a solo VC GP (Sarah Smith) who’s built an “AI-native firm that can deliver 10x value in 1/10 of the time.” If we take this at face value, it represents a new, tech-forward model of how smaller offices (such as endowments and foundations) can operate with a lean team.

I think the small investment office of the future will need a couple of key things:

  1. LLM-driven investment process. An AI agent should be able to start with a list of “most trusted sources” (e.g. Bloomberg, SEC EDGAR, Pitchbook, internal notes) and then branch out (e.g. via a self-driven Google search) to output a strong first pass of due diligence. It’s then up to the analyst team to review, think, and draw conclusions.
  2. Strong, well-structured internal database. In order for machine learning to work best, they rely on clean data (e.g. a singular data source for performance of companies and funds). LLMs (combined with data engineering) can help convert PDFs into well-structured data, which will then fuel future analyses.
  3. Data-driven and process-focused governance. If we believe that LLMs and data will change the world, the challenge becomes integrating these new tools into everyday workflows. From my experience in healthcare, this integration step is the most difficult – adopting and trusting a new system is extremely difficult.

In the past month, I’ve been most focused on thing #1 above, building an LLM-driven investment tool. It’s meant spending time with the minutiae of the data, how LLM agent systems are architected, building a (naïve) prototype, and seeing which start-ups and companies are worth using in this space. Many more posts to come on progress here (as well as investment deep dives, the end goal of all these tools!).

No comments:

Post a Comment

Dev Design: Investment Research "Agent"

PROBLEM SUMMARY [1] Companies are awash in information – both public and private – but have a hard time interacting with it all effective...