Sunday, April 26, 2026

The various modes of investment work

I've worked on a few deals in the past month, and I now finally have a lull to think a little more about how to use AI to automate/replicate some of the work I've been doing. I've been thinking a lot about the types of work that I do. Investment memo work falls into a few buckets:

  • Information distillation - given a data room of information, distill it down into the main points
  • Information search and aggregation - given a topic, try to find as many pieces of info on it and aggregate them together
  • Information verification - given a piece of information, verify whether it's correct 
  • Financial analysis - digging into the historical financials as well as checking the assumptions in the financial projections
  • Anomaly detection - given a document or set of data, see if any piece of it is off/wrong
  • Interviews with founders, experts, customers, etc. - requires a lot more EQ and conversational skills, but also preparation to tee up the highest impact questions for the investment (and the best fitting questions for the interviewee)
This all feels a little abstract and unnecessary, but teasing out these modes of investment work is the way a developer would look to approach a broader fix to the problem. And, more importantly: AI is going to be used in a different way for each of these modes of work. I think that LLMs are sufficiently strong where a lot of problems can be resolved through clever infrastructure (i.e. not just throwing everything into the chatbot and hoping it does everything we want). 

Some examples of the investment memo process that are particularly time-consuming:
  • Industry background - a lot of "information search and aggregation" -- Googling, ChatGPT'ing for trusted sources, listening to podcasts, reading consultant market maps, etc. Requires a lot of exploration and breadth; feels a little like climbing a mountain, and the surrounding landscape becomes clearer bit by bit. 
  • Competitors - "information search and aggregation" -- Google/ChatGPT for competitors, then look up info on each of them (market niche, traction, last fundraise, etc.)
  • Public comps and recent M&A - "information search and aggregation" - Google/ChatGPT for this info, then look each up (e.g. in Bloomberg/Yahoo Finance for public comps, internet verification for recent M&A)
  • Company (e.g. team, product, GTM, moats) - "information distillation" -- taking the whole data room and compressing it into a few pages of info. There's additional critical thinking (e.g. does their GTM actually make sense? are the moats really moats?), but otherwise a lot of depth here.
  • Term sheet review - most terms are typically standard (or within reason), so this is both "information distillation" (taking a 30-page legal document and boiling it down to a few key points) and "anomaly detection" (seeing if there are any particularly egregious or atypical terms). 
  • Traction/Financials - "Financial analysis", but then a lot of critical thinking and gut-checking on top of that.
  • Investment benefits and risks - this feels like it should be a lot about "gut" ... but I think an LLM could surface many good benefits/risks (and a human could then review and prioritize them)
I've been working on a verifiable, AI-driven "information distillation" process. Given a data room of information (or a pile of industry reports from McKinsey plus a few internet articles I dug up), can I get the AI to synthesize the info rooted in the files I give it? 

Part of me feels a little foolish for doing this: the Googles and Anthropics of the world are already doing this, and they have teams that are much smarter than me! However, I think the tech giants are solving a fundamentally different problem. Their chatbots are generally interested in: "can I answer any question given to me well?" and "if the user uploads documents, can I answer any question he asks about it?" The challenges are at least twofold: (1) the chatbot has to retrieve info about any topic I might throw at it, and (2) it has to be able to answer any question I ask. 

My little tool is much simpler: given documents, read the info and file it away neatly into folders that I ask it to. The process looks something like this:
  • Go through each document and extract information relevant to different buckets I give it (e.g. "company background," "team," "moats," etc.)
  • After processing all the documents, take one final pass to synthesize the extracted data together
Advantages: 
  1. You can apply this process to any document in the data room (just need to dial in the "folders" you file info into)
  2. The extracted info and process are verifiable (unlike typical LLMs which are black boxes)
  3. You can apply this process to many use cases -- just swap out the extraction schema and synthesis prompt
I'll have another post with more details, once I tidy up the development.



Saturday, April 25, 2026

AI pulse check

 Tech is changing week to week, so seems like it would be good to journal the in-the-moment sentiment. Tech news — especially those that get clicks — tend to be sensational, absolutist, … and conveniently, good marketing for tech products. 

Token consumption: one thing I remember hearing a few months ago was “don’t think about token count at all when you build, LLMs will get continue to get cheaper and go to zero.” I was a little skeptical — I’ve gotten burned on cloud hosting costs, and am skittish about letting pay-as-you-go LLMs go unchecked. There’s been some pullback on this sentiment: (a) agents run amok, churning through tokens without real value, (b) announcements of companies trying to maximize their token usage, a clearly misguided goal, (c) realization that smaller models (and gasp Chinese models) may be good for some use cases, and (d) the realization that cheap tokens are subsidized by VC money, and over time prices might increase just as Uber prices have. 

Pure vibe coding enthusiasm also has been tempered. The market believed (believes?) that vibe coding would end SaaS, but hitting the realization that software is ~20% code (the rest is infrastructure, marketing, support, installs, etc). I buy that vibe coding is lowering the barrier for designers to build MVPs, but I don’t think it’ll be able to build full apps well within next few years. Too many software architecture decisions etc. that will be hard for it to do.

The hot topics on my mind today: 

- Claude Cowork — I need to explore this more; it’s gaining early adoption. I’m little nervous about it having access to all local documents, but I can see the appeal of having one provider access to all your docs.

- Skills — excellent marketing term for a “template of instructions for the LLM.” Lowers the barrier for LLM entry, adds abstraction in a way non-programmers can understand. Need to play with this more to see if there’s more to it than that. 

- Process and clear instructions — interestingly with “skills,” the hard part is the boring stuff: process, governance, clear instructions. I heard one good analogy: treat AI like a new employee — you have to onboard it with culture, rules, norms, all the boring stuff. I think this will continue to be a trend; humans will be more and more important, just doing (what is to me more) boring work. But AI as with great companies: governance, structure, feedback win.

- Death of SaaS — world still seems split on whether SaaS is dead. Maybe it’s moving from a seat-based model to an outcomes-based model? I think one way SaaS dies if if (a) you can build your own software or (b) there comes along a superapp that can do everything for you. I’ve tried (a), and that approach may work for a small subset of technical people (who probably were building their own tools pre-AI). I could imagine a world where all computer commands are routed through an Anthropic or OpenAI who has access to your whole digital world. Hard to tell now if this will be mass-adopted or just confined to the most technical. 

Saturday, April 18, 2026

Example of AI coding speed + implementation

My start-up, Transpose Health, has a core data conversion engine, but data work also has a lot of random stuff that needs to be done. AI coding has been transformative for this. What might've taken a few hours to code (and test) now takes a couple minutes to prompt ChatGPT for, then a minute for it to code. 

My prompt:

write me a python script that: 

 - opens up <excel file> (an excel file) 

 - loops through all CSVs in <folder> that are CSVs and do not contain the word "Copy" (table 2) 

 - inner joins table 1 to table 2 on "Legacy Rx" and "PrescriptionNumber" (to only get the rows from table 2 that are in table 1) 

 - prints out all of the messages from the "hl7_str" column in table 2 into a txt file (separate the "hl7_str" rows with a return character)

Of course, ChatGPT was not perfect the first time around. I had to (a) prompt ChatGPT it to be less verbose (I needed a quick-and-dirty script I could review quickly, not an industrial-grade one), (b) run the script, (c) debug it (it had a small bug, so I had to do a deeper code review), (d) then test the outputs and iterate. Overall, the end-to-end process took ~30 minutes, but I saved significant time and brain energy on the coding part. 

(Interesting that my value-add shifts from understanding/writing the code to understanding how to prompt the LLM and QA the outputs. In other words: AI is taking all the technical implementation work from me, for better and for worse!)

Friday, April 17, 2026

AI: the boring things will be the most valuable

 I've been writing quite a few memos in the past few weeks, poring through data rooms with tens or hundreds of documents. I think I'm in a unique spot because (a) I love poring through data rooms (perhaps spending too much time on it), (b) it feels like many parts of the process are slower than they should be, and (c) I know that some of the pieces could be done by LLMs. 

Some concrete examples from VC: company background, IP, founder background, cap table, and past funding rounds are all pretty time-consuming, as well as a little tedious/rote to write, low on the time-to-value scale. (I'm more often than not annoyed that I have to dig through legal docs to get the investors, amounts, terms, etc. from prior rounds -- can't AI do that yet!?)

= = 

I've spent more time than the average person thinking, "If I were to write a program to do my job, how would I have it do my work?" (This pursuit of abstraction -- from real world to code -- is what software development is all about.) It turns out, when I'm writing an investment memo, I have a generally set rubric of questions I'm trying to answer. The largest, most institutional players (like NEPC, $1.7T AUA) have frameworks and checklists to make sure that you're hitting all the boxes. For me, these frameworks feel a bit boring, but it's hard to touch on exactly why; perhaps in my imagination, investment research is a creative endeavor that can't be done by checklist alone. Or perhaps, it's that nobody is excited about process -- which in this case can mean a 200-line checklist. 

But ... this framework approach approximates how humans think. I have a list of sections that I need to fill in, and each successive slide deck or Excel or article that I read adds some piece of evidence to one or many of these sections. To make it less abstract: let's say I'm researching low earth orbit satellites, and my goal is to write an industry report. My sub-sections might include: market segmentation, market size, growth drivers, demand drivers, regulatory/policy, and market risks. I might read a few McKinsey market reports that fill in some sections, then a policy report, talk with an expert, then review a few pitch decks for LEO start-ups. With each new document, I gather information for each bucket, then after I've reviewed everything, I synthesize it together, weighing the most trusted resources more highly. The process never happens this clearly, but I believe it's replicable steps that most people follow.

This same general framework -- read documents, extract relevant data, organize data into framework, resolve data conflicts, and synthesize a final analysis -- are what most knowledge work seem to follow. Breaking it down this way gives us a chance of replicating the steps with AI.

Why are LLMs not good at this out of the box? I'd argue that LLMs are general-purpose tools, built so that you can ask any question you want. Tools like Google's NotebookLM are excellent at this -- feed it documents, and ask it anything. Investment research -- and most knowledge work -- fundamentally has an end goal or question, and the framework/process that you use to get to the answer is "tribal knowledge," typically not well-documented and slightly different firm-to-firm. This "tribal knowledge" is what the LLM lacks out-of-the-box, and what I think can be codified for success.

= = 

The knowledge extraction process then becomes:

1. Read documents,

2. Extract relevant data points,

3. Organize data into framework,

4. Resolve data conflicts, and

5. Synthesize a final analysis.

Each step has its own technical challenges. For example, finding relevant documents can be difficult, as is reading them (reading slide decks, especially graphs and charts, is not easy for LLMs today). I believe the most valuable piece, though, is creating a knowledge framework, documenting that "tribal knowledge." The more we use AI, the more we'll realize that for most companies, value will still come from the boring things like process, documentation, and frameworks. 

Tuesday, April 7, 2026

Cash Flow Tooling for Fund Analysis

I've gotten quite "lazy" after working at a tech company for a while -- I know automation exists, and so I try very hard to avoid doing rote tasks. (Classic software engineering trap: spend 10 hours to save 1.)

That being said, AI coding tools should be able to help me do this quickly and automatically. There's two big steps to this process (we're only tackling the second here):

  1. Retrieve data (manual step for now) -- from PDFs, Excel files, etc. For now, I've done this manually, because this is a surprisingly hard problem to do at scale. (The files can be in so many different forms, and PDFs are notoriously tricky to read! There's also an open question on how much AI should process this vs. other means; e.g. if it's in Excel already, why use an LLM and run the risk of it hallucinating data?)
  2. Analyze data and build pretty graphs (AI-assisted code) -- Once the data is clean, this step requires no LLMs, just regular code. (This also means no private info is sent to any LLM.) Again, I'm using Google Antigravity; I've copied my initial prompt below.

Initial Google Antigravity prompt to build IRR tool:

help me build another tool -- "Cash Flow Analysis" -- that helps to analyze a VC/PE Fund's cash flows.

This should take in as input an Excel sheet with the following sheets/columns:
"LP Gross Cash Flows" - columns: Fund, Company, Date, Gross Cash Flow, Notes

"Unrealized Value" - columns: Fund, Company, Date, Gross Cash Flow

"Unlevered Cash Flow" - columns: Fund, Transaction, Type, Date, LP Net Cash Flow

"LP Capital" -- columns: Fund, Date, LP Net Cash Flow


Outputs:
- a graph of cash flows, usin ghte LP Gross Cash Flows chart, with $ amount on Y axis and time period on x-axis. DIsplay negative cash flows as a red bar (down) and positive (i.e. distributions) as blue. also have a "net cash flows" line graph

use altair to build this. allow filtering by Fund and Company. also allow a parameter for the x-axis to be quarterly, half-year, and year increments.

- generate a table showing for each fund/company the total investment, total distribution, whether exited completely, gross IRR.

- calculate total gross IRR per fund, using unrealized value in the equation as well


build a similar graph as above with unlevered cash flow tab. use altair to build this. allow filtering by Fund. also allow a parameter for the x-axis to be quarterly, half-year, and year increments.

calculate fund-level IRR using the current unrealized values in LP Capital slide

Errors and Clean-Up
AI coding can be finicky. (Equally finicky are my prompts -- there's a ton of room for AI to "interpret" what I'm asking of it.) This is where a lot of the extra time with AI code comes in -- refining, revising, pivoting, etc. Here are all the additional prompts I added on, typos and all (took ~2 solid hours in total):
  • I'm hitting this error: Error processing Excel file: 'Company'
  • Error processing Excel file: Invalid frequency: QE, failed to parse with error message: ValueError("for Period, please use 'Q' instead of 'QE'")
  • Error processing Excel file: 'Column not found: Gross Cash Flow'
  • meh roll back that change. Instead can you abstract all of these column headers to the top of the routine as variables? maybe use a naming convention like "LP_GROSS_CASH_FLOWS__FUND" to designate "SHEET__COLUMN". this will help when we expand this later on
  • can you remoe comapny filter option from LP Gross Cash Flows? Maybe jus filter the company summary table by that ... also -- if it's yearly x-axis increments, can you just show the year and not months in between? .... also the fund filters tabs are cut off so it's ahrd to see the full fund name, can you fix? ... i also get this error in company summary table: Error processing Excel file: complex exponentiation.
  • can we replace this with a standard py function? def xirr(cashflows, dates):
  • remove the company filter entirely ... for 2022 for one of the funds, i have both investments and distribution and i expect to see both, please update ... in the company summary table, add in "earliest investment" "last distribution" and "unrealized value" columns
  • can you also adjust the green "net cash flow" to be the disributions minus investments? also label this line. can you also disable zooming and out of the graph, and make it a little taller?
  • move the "gross IRR per fund" into a streamlit metric ... below company summary table, add in a filter to allow filtering by company to see all investments/distributions/unrealized value
  • in the "detailed company data" -- can you sort the list alpahbetically for ease?
  • can you update Fund Filter to single-select dropdown
  • okay - let's remove the x-axis iincrements -- let's just keep it at "year" roll-up. ... i am still seeing two years per year (e.g. 2017 2017) in the x-axis.... Let's rename x-axis to "Year' also .... add a box around the metric ....
  • at the fund level: can you add as metrics "total invested", "total distributed" , "DPI" (which is distributed / invested), MOIC, earliest investment, latest investment .... at the "detailed company data" level, can you add in IRR per fund, total invested, total distributed, unrealized value as metrics?
  • rename "Positive/Distrubtion" to just "Distribution" and "Negative/Investment" to just "Investment" ... make the bars in the bar graph skinnier (more aesthetic)
  • split the metrics onto two lines with 4 metrics wide (the full $ amount gets cut off). maybe also abbreviate $5,000,000 to $5.0M (maybe create a shared tag to do that, if a library function doesn't exist?)
  • can you add all the same metrics to the unlevered cash flow screen? (total invested, total distributed, unrealized value, dpi, moic, gross irr, latest investment) ... can you also add the vintage to both
  • can you actually add the vintage in the name of the fund as well? ... in a collapsable section, can you also print out all the numbers that go into the gross fund-level IRR calculation? (i need to check #s) ... remove "Micro Metrics" (I think it's not needed)
Result
Here's a redacted version of the output. Overall -- fairly successful project, with a good foundation (i.e. not vibe coded). Certainly can be refined more, but a good look at "what's possible today" (and hopefully a time saver for me in the medium- and long-term!).


Sunday, April 5, 2026

Emotions and Investing

The current global geopolitical situation is distressing to me. That we went to war with Iran -- seemingly on a whim, with a first strike, and with the belief that we'd solve the Middle East in just a few days -- has to be amongst the dumbest, most unforced error that I can imagine, a level of hubris only enabled by yay-sayers following Venezuela. 

4/2/26: Yet another memorable (and jaw-dropping and frightening) Truth Social post today ...

I couldn't get past the sheer stupidity of our Iran entanglement, which I believe made me not notice it as a good potential investment opportunity. The smart investors, however, are able to think through these types of events a little more clearly. In hindsight, there was probably a decent chance that Trump would do something in Iran, especially after his Venezuelan conquest, which would inevitably drive up the price of oil. A simple bet, then, might've been long oil -- low-ish downside but high potential upside. 

More broadly, I think it'll take a little more time and experience to extricate my emotions from the act of investing. It brought me back to my days working in healthcare IT: inevitably someone would do something stupid, and we (I) would have to help clean it up. I got really good at just ignoring who caused the issue and instead diving in and fixing it. Perhaps it's a little harder with investing, especially because politics are inherently emotional (especially today). Nevertheless, still a good weakness to identify and an aspiration to work towards.




AI Notes - AlphaSense and Future Home Grown AI Tools

In the past month, I've written a few investment memos from soup to nuts, so now have strong opinions on what LLMs might be good for. 

First, AlphaSense. We have access to AlphaSense, so I've been able to give it a whirl. My takes:

  • Mediocre but voluminous investment memo writer -- We've come up with a decent investment memo prompt, and it can pull together a 20-page investment memo in ~15 minutes. However, some info isn't relevant, some isn't well supported, and overall ... it's 20 dense pages. More detailed gripes below.
  • References to internal sources -- One thing I love and hate is that it links to internal materials (e.g. Affinity, Sharepoint). Upside: when I've had trouble finding a single piece of info from the data room, AlphaSense did a good job on hunting down the sources of truth. Downside: it'll link to old memos or in-flight memos, which is not helpful if I'm trying to verify or double-check details for the in-flight memo. 
  • Expert calls -- I've been diving into some topics I have little prior knowledge in, so I've tried to make use of what I see as AlphaSense's primary edge: expert calls. I've tried to ask it something like: "Find me all expert calls that talk about X industry or Y company or its competitors" to get better situated from insiders. Overall valuable, but not worth the price of AlphaSense alone.
My current thoughts on tools that'd be (a) buildable and (b) useful:
  • Draft investment memo writer -- I think a well-constructed LLM tool can write a good investment memo. The keys are (a) asking the right sequence of questions to the LLM, (b) linking out to external sources for some pieces (e.g. for public market comps), and (c) knowing which pieces to delegate to LLM and which pieces should be 95% human. (I'll dive more into in another post, but one example: the "Investment Thesis" section should 100% be human-written -- if not, what is your value as an investor??)
  • Data room analyzer -- I wrote a little tool to help spill out all the contents of the data room (i.e. show me all the files in all the folders in an easy way), which is a good first start. Next step would be to get a rough summary of each files, and step after would be doing some heavier analysis. It'd also be great to be able to ask questions of the files, or at least have it help me find where X company is referenced. (i.e. what I use AlphaSense for). Should be buildable with LLMs and RAG (at least in a rudimentary form) -- big question is if 80% of max capability (what I-plus-AI am capable of) is good enough.
  • Investment memo cleaner upper -- Would love to have a tool to go through a memo and tidy up tables, help with resizing, flag areas to review/condense, etc. Not sure how much you can integrate into a Word doc. Will explore.

The various modes of investment work

I've worked on a few deals in the past month, and I now finally have a lull to think a little more about how to use AI to automate/repli...