Monday, May 25, 2026

Progress on CRM

I wrote a few weeks ago that I had started to build a CRM tool. I've made pretty good progress, but I've realized that it's less about the CRM itself and more about building the tools that I want to see, with AI/document processing at its core. I've come to remember that I'm opinionated on making the tools as easy to use as possible, but also insistent on clean audit trails and ease of troubleshooting, as well as clean data structures and modular code. 

What's done so far:

  • Core data models -- I've built out the core data models and data tables into Supabase. Users can manually build companies, investment firms, and people into the system. 
  • Document uploads -- Users can upload documents -- which can then create new companies, etc. in the system and link them automatically. I think this is the holy grail -- just upload documents and have it update your internal notes! 
  • Audit trail -- I have a good audit trail mechanism built out, telling you how a company was created (manually vs. automatically), as well if individual fields were updated. May seem like overkill now, but I think a must longer-term (and something we can build on). Good foundations!
  • Authentication -- started but not quite working. Google OAuth login enabled through Supabase. Something not 100% working
  • Designed to be plug-and-play -- I've designed this to be plug-and-play for any investment firm. All you'd need to do is plug in a few of the company's own things (e.g. Supabase information, OpenRouter API key, and a few more things), and you'd be off and running! (This could also be deployed in a Docker and shipped to multiple customers!)
  • Data pipelines -- Much of the data gets piped in from somewhere, on some sort of cadence. Examples: (1) some VCs look at a16z's website weekly to add to their sourcing queue, (2) CT Innovations could pull companies from the CT business registry weekly, (3) newsletters delivered to my email might contain start-ups I should check out. I have a simple data structure to schedule these pipelines, which will need to be built on with more use cases.
What's next:
  • Vision of email-driven + automated workflows -- From what I've seen, investment workflows and communications are driven largely by email. Therefore, a good CRM should integrate into existing workflows, while having the capacity to augment them. What this looks like: the system tracks email flow, downloads documents (and automatically uploads them to CRM), and the CRM interprets where we are along in the process. A concrete example: if we're doing diligence on a fund, the CRM tool should (1) automatically download the pitch deck and supporting docs, (2) interpret them, (3) interpret and import important dates (e.g. expected closing date or other funds circled), and (4) ensure we follow up (either with a rejection or a request for updates, if it's been a few months since last communication). The system should drive the investor to do this, without having to configure anything additional in the system. 
  • More granular company/investor data -- Next up from the data model side will be adding in things like revenue (for a single company), funds (for investment firms), and other data (portco allocation, percent ownership, cash flows, etc.) The challenge is finding a good, flexible data structure that can account for all the pieces of info you might want to store discretely. 
  • Showing provenance of data -- Data can come from multiple sources e.g. public/published articles, hearsay from other investors, and source documents from the company itself. The system should be smart enough to track the data provenance -- revenue numbers from the company themselves are much better than a guesstimate from a published article, which are much better than a rough estimate. This distinction is crucial for contextualizing the system data (and thus for downstream AI processing)
  • Handling Excel and better text extraction -- Right now, the text extraction (i.e. PDF -> text) is pretty good but not bulletproof, which'll be important for trusting the numbers. Likewise, we don't handle Excel yet. (Excel documents are universally tricky for LLMs to handle.) Both are fixable but more technical problems; better to get a rough version 1 of the entire system before honing in on these challenging minutiae. 
  • Integration with other vendors -- So far, I have integration with Google OAuth for login. Perhaps some integrations with other parts of the investment stack (e.g. CRMs, knowledge management tools, financial tools) to allow this tool to orchestrate everything together.
What I've learned/observed about AI and sustained human superiority:
  • Humans still need to make decisions -- I can use AI to help me build the data models and SQL tables, but I (as the human) still need to make decisions on how simple or complex the data models are. LLMs often over-engineer, so it's still up to me to set a good foundation by building a simple but strong core data model. (These data structure tradeoffs are nearly impossible for AI to handle -- it doesn't know exactly what I want to build next, so it doesn't know what foundation to lay!)
    • A silly example: if we had robots that could build a house, we would still need expert humans to help guide us through the trade-offs (e.g. how many bathrooms, what materials to use, cost vs. quality of material, which type of siding, etc. etc.). No difference in engineering code.
  • AI lacks taste -- For some decisions, like on UI layout, the AI adds a bunch of crud which makes the app look/feel distasteful. I'm no designer, but I think on many decisions, I have taste that AI will never quite be able to capture. 
  • Humans know when to quit (and where to be creative) -- My AI coding agent was dogged in trying to fix a particular issue -- it tried to brute force its way through the problem, without success. After about 20 minutes (and a few failed updates), I pushed it to try a different solution.  (Technical detail: I asked it to make a synchronous call asynchronous, and update/centralize code to make this work.) The new solution worked well. It's a case where a human (me) knew when to say: "This is taking longer than it should, and I have other good ideas on how to fix this that match my longer-term vision better." This is also a case where knowing how the underlying system works is crucial -- if this were vibe-coded, there'd be no way to know the root cause of the problem, or approaches to fixing it. 
  • AI coding agents are great at execution -- AI coding tools are very good at writing code. The more prescriptive you can be, the better the execution.


Sunday, May 24, 2026

Claude Co-Work vs. In-House Tools

Just a month or two ago, the tech world boasted about how many tokens it burned (i.e. how large its operating expense was). That party didn't last long. The consensus now is that for many jobs, hiring a person is easier/cheaper than hiring an AI agent. 

This has been my fear with agents all along: (1) what kind of agent truly needs to run automatically 24/7 and (2) how many tokens would this eat up needlessly. Side note: I got burned with cloud hosting on both Snowflake and Databricks for a personal project -- billed for cloud/GPU capacity that I wasn't using at all -- so I'm more sensitive than most to trusting tech companies with my credit card.

I've also consistently heard great things about Claude Co-Work. I've been tinkering with building my own tools for a while, and I had a moment of panic: is there any use in what I'm building, or am I reproducing things that Claude (and thousands of other smart developers) are already building?

ChatGPT helped drum up where the tradeoffs are. I put them below, as a reminder to myself that in-house tools -- ones that you know how they work, that link to the data you want, etc. -- have lasting value. Maybe not as a venture-backable company, but real time-saving workflow value. 

Anyways, here are the top dimensions where building your own software really shine over what Claude (or similar) offers. 

Dimension Claude / Chat UI Custom Pipeline
Time-to-value Extremely fast Slower
Repeatability Weak/moderate Strong
Flexibility Limited Full control
Structured data extraction   Okay Excellent
Audit trails Weak Strong
Multi-stage workflows Awkward Natural
Integration with database Limited Native
Cost at scale Can become expensive Often cheaper at volume
Vendor lock-in High Low
Human-in-loop review Limited Fully customizable

Tuesday, May 12, 2026

Intuition vs Process

In the investing world, there's a tension between intuition and process. 

Some of the "greats" that I've met seem to eschew frameworks or process. The thinking goes: process makes your lazy -- you think better if you start from "first principles" every time. On the face of it, it feels like it should be correct; great investors are great because they've had to teach it to themselves from the ground up. What makes an investor great is their ability to think and diligence. 

It feels like some investment firms (especially smaller ones) are designed with the solo investor in mind. There is no firm standardized process, no standardized sourcing pipeline, no general training. To do so would constrain the investor, who needs to use their gut to make their decisions (so the thinking goes). A framework would stand in the way of intuition.

However ... when you assess funds as an LP, a lot of focus is on process: can we trust this manager to produce alpha, and do they have a replicable process? To assess this at larger investment consultants is the other end of the extreme. The investment framework is truly a process, a 200-plus-line Excel spreadsheet of investment criteria which is then synthesized into a memo. The intuition purists would say: yes, you've checked every box, but you've missed some je ne sais quoi about the investment, something not in your checklist that makes it stand out (or makes it fall apart). There are parallels to Atul Gawande's The Checklist Manifesto, where checklists improve surgery (and air flight safety) despite being initially despised by surgeons (and pilots, etc.)

All this to say: the investment firms that will be most successful with AI will be those that translate process into software-codified systems -- i.e. investment in AI will be all about process. One example: right now, as a firm, sourcing at some places feels spontaneous -- every investor has their own set of connections, resources, rules, etc. Some of this could/should be codified in agentic workflows; I've started to collect a list of "high quality" sources (e.g. a16z speedrun, etc.). Another example: we have a light investment memo template, but many questions are asked beyond what the template contains. These questions should ultimately be subsections of the template, which can then a yardstick by which to measure investment diligence. This means updating a core investment framework template, to ensure that that questions is answered every time. 

On one hand, I hate it -- filling out a 200-question survey (and adding questions to it) seems to take the joy out of reading, learning, and investing. On the other hand, it's a bit embarrassing to miss certain pieces of diligence over and over again. And as a newer investor, it's a bit bewildering to be given tens of documents to read, without a rough mental framework in mind. 

Ultimately, I'm coming to believe that "investment experience" might just mean "I've built a really solid investment framework in my head." It means you know where to look first when doing diligence, or what the top questions are to ask -- those sections of the framework are raising red flags. So why not hand out this framework to earlier-career investors? I think it's ultimately what we'll be training investment LLMs on, a long-time reckoning that investment process is essential.

Sunday, May 10, 2026

Beginning a larger project: creating a CRM

 I've been putting off creating a CRM tool, but I think it's about time. Maybe CRM is the wrong word: I need a tool to store info about all companies, investors, people, etc. to be able to build on down the road. The tipping point: I want to create a sustainable sourcing framework that scrapes data from X source weekly. (Again, I am too "lazy" to do this weekly myself -- I'd rather list all the sources I want to pull from, and have it run automatically.) 

Part of this project is to see how long it takes to build something like this. I know a lot of folks are building these kinds of tools in Notion, but who wants to build a "Notion database" when you can build a more robust SQL one? I also think this sort of tooling is going to be critical scaffolding to build other AI tools on top of (e.g. document processing). 

My full vision makes this a little clearer:

1. Create core data models. Create data models for "organizations" (companies and investors), funds, people, programs (e.g. a16z speedrun, DARPA) and cohorts, as well as all the relationships between them (company-to-person, investor-to-company, program-to-company, etc.). Allow users to create/update these. All built in Supabase (backend), frontend in Streamlit (for now).

2. Create data pipelines. Populate companies from websites. For example, a16z speedrun cohort 6 was recent; the pipeline should be able to pull from their website and pull companies and relationships.

3. Audit trail. See who updated the companies (pipeline? user?) 

4. Add authentication, other usability tools. Let users log in via Google, email users when things are updated in the pipeline, make the scheduler work.

5. Add in investment details and company details. Right now we have the basics (basic company demographics, binary on whether an investment was made). Expand out to support a more full investment structure (e.g. X invested $Y in Z in Series A) as well as more robust company reporting ($X in revenue in 2025, etc.)

6. Add database partition? Find best way to make the database specific to a single investor, so that you can layer on multiple companies who can't see each others' data? (And try to stay in Streamlit for simplicity?)

6. Add in document processing. This is where it gets investor-specific. Process documents to extract revenue, investment #s, etc.

7. Add in other fun workflows? Allow users to connect their emails so you can see if you've emailed X company? Layer in automated meeting notes?


Hoping to be able to get through #1-3 this weekend; would be great proof-of-concept. The hard parts: database design (need well-designed database to withstand all future additions I'm planning!), knowing how to layer in the additions, knowing which order to tackle these in, knowing what the benefits/weaknesses of the tools (like Streamlit, SQL, scraping tools, etc.) are, hoping Google Antigravity is able to suss out what I want it to build with my instructions. 

Sunday, April 26, 2026

The various modes of investment work

I've worked on a few deals in the past month, and I now finally have a lull to think a little more about how to use AI to automate/replicate some of the work I've been doing. I've been thinking a lot about the types of work that I do. Investment memo work falls into a few buckets:

  • Information distillation - given a data room of information, distill it down into the main points
  • Information search and aggregation - given a topic, try to find as many pieces of info on it and aggregate them together
  • Information verification - given a piece of information, verify whether it's correct 
  • Financial analysis - digging into the historical financials as well as checking the assumptions in the financial projections
  • Anomaly detection - given a document or set of data, see if any piece of it is off/wrong
  • Interviews with founders, experts, customers, etc. - requires a lot more EQ and conversational skills, but also preparation to tee up the highest impact questions for the investment (and the best fitting questions for the interviewee)
This all feels a little abstract and unnecessary, but teasing out these modes of investment work is the way a developer would look to approach a broader fix to the problem. And, more importantly: AI is going to be used in a different way for each of these modes of work. I think that LLMs are sufficiently strong where a lot of problems can be resolved through clever infrastructure (i.e. not just throwing everything into the chatbot and hoping it does everything we want). 

Some examples of the investment memo process that are particularly time-consuming:
  • Industry background - a lot of "information search and aggregation" -- Googling, ChatGPT'ing for trusted sources, listening to podcasts, reading consultant market maps, etc. Requires a lot of exploration and breadth; feels a little like climbing a mountain, and the surrounding landscape becomes clearer bit by bit. 
  • Competitors - "information search and aggregation" -- Google/ChatGPT for competitors, then look up info on each of them (market niche, traction, last fundraise, etc.)
  • Public comps and recent M&A - "information search and aggregation" - Google/ChatGPT for this info, then look each up (e.g. in Bloomberg/Yahoo Finance for public comps, internet verification for recent M&A)
  • Company (e.g. team, product, GTM, moats) - "information distillation" -- taking the whole data room and compressing it into a few pages of info. There's additional critical thinking (e.g. does their GTM actually make sense? are the moats really moats?), but otherwise a lot of depth here.
  • Term sheet review - most terms are typically standard (or within reason), so this is both "information distillation" (taking a 30-page legal document and boiling it down to a few key points) and "anomaly detection" (seeing if there are any particularly egregious or atypical terms). 
  • Traction/Financials - "Financial analysis", but then a lot of critical thinking and gut-checking on top of that.
  • Investment benefits and risks - this feels like it should be a lot about "gut" ... but I think an LLM could surface many good benefits/risks (and a human could then review and prioritize them)
I've been working on a verifiable, AI-driven "information distillation" process. Given a data room of information (or a pile of industry reports from McKinsey plus a few internet articles I dug up), can I get the AI to synthesize the info rooted in the files I give it? 

Part of me feels a little foolish for doing this: the Googles and Anthropics of the world are already doing this, and they have teams that are much smarter than me! However, I think the tech giants are solving a fundamentally different problem. Their chatbots are generally interested in: "can I answer any question given to me well?" and "if the user uploads documents, can I answer any question he asks about it?" The challenges are at least twofold: (1) the chatbot has to retrieve info about any topic I might throw at it, and (2) it has to be able to answer any question I ask. 

My little tool is much simpler: given documents, read the info and file it away neatly into folders that I ask it to. The process looks something like this:
  • Go through each document and extract information relevant to different buckets I give it (e.g. "company background," "team," "moats," etc.)
  • After processing all the documents, take one final pass to synthesize the extracted data together
Advantages: 
  1. You can apply this process to any document in the data room (just need to dial in the "folders" you file info into)
  2. The extracted info and process are verifiable (unlike typical LLMs which are black boxes)
  3. You can apply this process to many use cases -- just swap out the extraction schema and synthesis prompt
I'll have another post with more details, once I tidy up the development.



Saturday, April 25, 2026

AI pulse check

 Tech is changing week to week, so seems like it would be good to journal the in-the-moment sentiment. Tech news — especially those that get clicks — tend to be sensational, absolutist, … and conveniently, good marketing for tech products. 

Token consumption: one thing I remember hearing a few months ago was “don’t think about token count at all when you build, LLMs will get continue to get cheaper and go to zero.” I was a little skeptical — I’ve gotten burned on cloud hosting costs, and am skittish about letting pay-as-you-go LLMs go unchecked. There’s been some pullback on this sentiment: (a) agents run amok, churning through tokens without real value, (b) announcements of companies trying to maximize their token usage, a clearly misguided goal, (c) realization that smaller models (and gasp Chinese models) may be good for some use cases, and (d) the realization that cheap tokens are subsidized by VC money, and over time prices might increase just as Uber prices have. 

Pure vibe coding enthusiasm also has been tempered. The market believed (believes?) that vibe coding would end SaaS, but hitting the realization that software is ~20% code (the rest is infrastructure, marketing, support, installs, etc). I buy that vibe coding is lowering the barrier for designers to build MVPs, but I don’t think it’ll be able to build full apps well within next few years. Too many software architecture decisions etc. that will be hard for it to do.

The hot topics on my mind today: 

- Claude Cowork — I need to explore this more; it’s gaining early adoption. I’m little nervous about it having access to all local documents, but I can see the appeal of having one provider access to all your docs.

- Skills — excellent marketing term for a “template of instructions for the LLM.” Lowers the barrier for LLM entry, adds abstraction in a way non-programmers can understand. Need to play with this more to see if there’s more to it than that. 

- Process and clear instructions — interestingly with “skills,” the hard part is the boring stuff: process, governance, clear instructions. I heard one good analogy: treat AI like a new employee — you have to onboard it with culture, rules, norms, all the boring stuff. I think this will continue to be a trend; humans will be more and more important, just doing (what is to me more) boring work. But AI as with great companies: governance, structure, feedback win.

- Death of SaaS — world still seems split on whether SaaS is dead. Maybe it’s moving from a seat-based model to an outcomes-based model? I think one way SaaS dies if if (a) you can build your own software or (b) there comes along a superapp that can do everything for you. I’ve tried (a), and that approach may work for a small subset of technical people (who probably were building their own tools pre-AI). I could imagine a world where all computer commands are routed through an Anthropic or OpenAI who has access to your whole digital world. Hard to tell now if this will be mass-adopted or just confined to the most technical. 

Saturday, April 18, 2026

Example of AI coding speed + implementation

My start-up, Transpose Health, has a core data conversion engine, but data work also has a lot of random stuff that needs to be done. AI coding has been transformative for this. What might've taken a few hours to code (and test) now takes a couple minutes to prompt ChatGPT for, then a minute for it to code. 

My prompt:

write me a python script that: 

 - opens up <excel file> (an excel file) 

 - loops through all CSVs in <folder> that are CSVs and do not contain the word "Copy" (table 2) 

 - inner joins table 1 to table 2 on "Legacy Rx" and "PrescriptionNumber" (to only get the rows from table 2 that are in table 1) 

 - prints out all of the messages from the "hl7_str" column in table 2 into a txt file (separate the "hl7_str" rows with a return character)

Of course, ChatGPT was not perfect the first time around. I had to (a) prompt ChatGPT it to be less verbose (I needed a quick-and-dirty script I could review quickly, not an industrial-grade one), (b) run the script, (c) debug it (it had a small bug, so I had to do a deeper code review), (d) then test the outputs and iterate. Overall, the end-to-end process took ~30 minutes, but I saved significant time and brain energy on the coding part. 

(Interesting that my value-add shifts from understanding/writing the code to understanding how to prompt the LLM and QA the outputs. In other words: AI is taking all the technical implementation work from me, for better and for worse!)

Progress on CRM

I wrote a few weeks ago that I had started to build a CRM tool. I've made pretty good progress, but I've realized that it's les...