AI Investment Pitch — Question Card

🧬

Identify the Body Plan First

Where does this company sit in the AI Cambrian? The answer changes everything about the thesis.

Horizontal AI

Used across all industries & roles. Moat = distribution. Think: Cursor, Perplexity, M365 Copilot. Question: can an incumbent absorb this in one update?

Vertical AI

Built for one industry. Moat = domain knowledge + workflow integration. Think: Harvey (legal), Abridge (medical). Question: does the founder understand the workflow personally?

Agentic AI

Acts autonomously, multi-step. Moat = narrow domain + outcome pricing. Think: Sierra, 11x, Decagon. Question: is this a narrow task or an open-ended "do my job" agent?

Infra / Picks & Shovels

Sells to the AI builders themselves. Moat = scarcity + pricing power. Think: Vertiv, Schneider, AMAT. Question: does everyone in the layer above need this — regardless of who wins?

The stack lens: Energy → Data Centers → Semis → Cloud → Foundation Models → Horizontal → Vertical → Agentic. Scarcity increases as you go down the stack. Optionality increases as you go up. Know which direction this company's moat runs.

⚖️

The Service-as-Software 3-Question Test

SaaS TAM = $300B. Services TAM = $5T. One test tells you which you're investing in.

Does it price per outcome, or per seat?

Per outcome (per resolved case, per matter, per conversation) = pricing against the services budget. Per seat = still fighting for IT budget. One is a $5T market; the other is $300B.

Is the economic buyer the CIO — or the head of legal, CMO, or COO?

If the buyer is the CIO, you're in a software fight (capped TAM). If it's an operational or professional services owner, you're in a services fight — 10x the pool, longer contracts, higher NRR.

Can the customer measure the ROI in dollars and minutes saved?

Quantifiable ROI = near-zero churn even in a downturn. Fuzzy ROI = first budget line cut. Harvey measures hours-per-matter. Abridge measures minutes-per-encounter. If they can't name the metric, they don't have it.

Score: 3 yeses = genuine Service-as-Software business — 10x contract ceiling vs SaaS. 2 yeses = positioned to get there; probe the pricing evolution. 1 or 0 = SaaS wrapper with AI features. Still investable — different thesis, different ceiling, different multiple.

⚙️

Technology & Differentiation

Separate real AI from a GPT wrapper in 5 questions

What model(s) are you using — and what happens if that provider raises prices 10×?

100% API-dependent with no fine-tuning = a feature, not a company. If OpenAI adds this to ChatGPT tomorrow, are you gone?

What happens if I give your product the same prompt through ChatGPT?

Forces them to articulate actual technical differentiation. If there's no clear answer, there's no real moat.

Show me a failure case. When does your AI get it wrong, and how often?

Honest teams know their hallucination rate and edge cases cold. Evasion on this question is a red flag. Ask for the error rate on their specific use case, not a generic benchmark.

Do you fine-tune, use RAG, or both? Where does your training data come from?

Proprietary data + domain-specific fine-tuning = defensible. Pure prompt engineering + public data = fragile. Licensed or first-party data = safe. Scraped = litigation exposure.

What's your cited-source or grounding strategy for high-stakes domains?

Legal and medical AI must be citation-verifiable. Ungrounded outputs won't survive enterprise procurement. Ask how they handle the hallucination-vs-utility tradeoff in their vertical.

📊

Unit Economics & Margins

Inference costs can silently destroy the cap table

What's your gross margin after inference / compute costs?

Traditional SaaS: 70–85%. AI-native structural ceiling: 50–65% if API-dependent. Ask for real COGS — many AI companies hide 40–60% compute in "cost of revenue" line items.

What does it cost to serve your median customer per month vs. your 90th-percentile power user?

Token usage variance is enormous. One power user can blow up unit economics for 20 median users. The shape of this distribution matters as much as the average.

Does your pricing model scale with your cost curve — or against it?

Per-seat pricing with usage-based costs = dangerous. As customers use more, you lose more margin. Per-outcome or consumption pricing = self-aligning. Which side of this are you on?

What's your CAC payback period — and does your model provider's pricing factor into that calculation?

CAC payback over 18 months in an API-dependent company = dangerous. Model costs may rise before you recoup. If they're on OpenAI/Anthropic APIs, they don't fully control their own COGS.

🔐

The Data Question (It's Not What You Think)

Data moats are being commoditized by synthetic data — ask the right version

Does using your product generate proprietary data that improves the model over time?

The flywheel: more users → more labeled output data → better model → more users. Without compounding data, there's no moat — just a feature. Ask for the feedback loop, not the dataset size.

Can your data moat be replicated with synthetic data within 12 months?

Synthetic data now achieves ~94% of real data quality at 60% of the cost. "We have 50M records" is not a moat if those records can be generated. What makes your data unreplicable?

Do you have legal rights to your training data? What's your exposure to copyright litigation?

Copyright litigation against AI training data is accelerating (NYT v. OpenAI being the template). Licensed or first-party data = safe. Web-scraped content = increasing liability.

If Anthropic or OpenAI shipped a competing vertical feature tomorrow, what would you still have?

The platform risk question. The honest answer should be: proprietary data pipeline, workflow integration depth, customer switching costs, and domain-specific accuracy that a generalist can't match in 6 months.

🔗

Stickiness & NRR

Retention separates the Cambrian survivors from the extinct body plans

What's your net revenue retention — and how does it compare to 118% enterprise SaaS benchmark?

AI-native SaaS median NRR is 48%; premium AI tools ($250+/month) are at 85%; strong enterprise = 110%+. Below 80% = churn outpacing expansion. Ask for GRR (gross) separately from NRR (net).

What does a customer lose if they churn tomorrow?

Custom fine-tuned models? Workflow history? Institutional knowledge? Integration depth? If the answer is "they just log out," switching cost is zero and NRR will crater eventually.

Show me your DAU/MAU ratio. How often do customers return?

Below 30% DAU/MAU = tool of convenience, not necessity. For agentic products, this metric is less meaningful — ask for tasks-completed-per-week instead. Frequency of use is the stickiness proxy.

For agentic companies: what's your resolved-task rate vs. human escalation rate?

Sierra targets 60-80% autonomous resolution. Below 40% = the agent isn't ready for production. Above 90% = either the tasks are too easy, or they're not measuring failures honestly.

👥

Team & Domain Credibility

AI talent + domain knowledge = the rarest combination

Does the founding team have domain expertise in the vertical they're targeting?

Harvey's founder quit as a litigator to build it. Abridge's CEO is a practicing cardiologist. AI engineers without domain knowledge build solutions looking for problems. This is the single most predictive team question for vertical AI.

Who on the team has shipped production AI at scale — not just prototypes or demos?

Demo ≠ production. Getting AI working for 100 beta users ≠ getting it working for 100,000 paying enterprise customers. Ask about production incidents, latency SLAs, and failover strategies.

Is the team diverse in background — technical depth and domain expertise represented?

Homogeneous teams (all ML engineers, no domain experts) are a red flag for vertical AI. You need someone who has sat in the chair the customer sits in. "We hired a consultant" doesn't count.

What's the team's relationship to their first 10 customers? Can I talk to two of them?

Reference calls are the highest-signal due diligence in early AI. Ask customers: "Would you work with them again?" and "What's broken?" The answers to the second question tell you more than any pitch deck.

🛡️

Risk, Privacy & Compliance

Regulatory velocity is accelerating — ask before the IC

Where does customer data go? Is any of it used to improve base models?

Zero data retention (ZDR) is table stakes for enterprise. Enterprise contracts at OpenAI/Anthropic are firewalled from base-model training — consumer products are not. Know which you're building on.

Are you classified as high-risk under the EU AI Act? How are you handling it?

High-risk systems (legal, medical, HR) face mandatory audits, conformity assessments, and fines up to 7% of global revenue. Effective August 2025. If they haven't budgeted compliance, add it to your risk model.

What's your hallucination rate on domain-specific tasks — not generic benchmarks?

MMLU and HellaSwag scores mean nothing for a legal AI tool. Ask for accuracy on their actual customer workflows. If they can't produce a domain-specific eval, they don't have one.

What happens to my data and model outputs if you run out of money?

90% of AI startups will fail. Data portability, model export rights, and escrow clauses should be standard in enterprise contracts. If they don't have answers to this, their enterprise customers haven't done real procurement.

⚠️ Walk-Away Red Flags

✗ Can't place themselves on the AI stack. If they don't know whether they're infrastructure, horizontal, vertical, or agentic, they don't have a thesis — they have a product.

✗ Per-seat pricing with no roadmap to outcome pricing. Signals they're pricing against IT budgets, not services budgets. The TAM ceiling is 10× lower than it needs to be for the valuation.

✗ No measurable ROI metric for customers. If the customer can't explain what it saved them in hours or dollars, churn is a question of when, not if.

✗ 100% API-dependent, no fine-tuning, no proprietary pipeline. A prompt wrapper. If OpenAI adds one feature, the company evaporates. This is the "Jasper AI" failure mode.

✗ Demo works; production is unverified. Ask to run your own data through the system live. If they resist, that's the answer. Devin benchmarked at 14% real-world task completion despite a viral demo.

✗ Gross margin below 50% after inference. Structurally dangerous. Compute costs will compress further at scale before they improve. Model provider pricing power over COGS is the hidden risk.

✗ No domain expert on the founding team for a vertical play. AI engineers alone can't identify the right problem to solve in legal, medical, or finance. The founders should have sat in the customer's chair.

✗ TAM = the entire industry. "Legal AI is a $500B market" is not a TAM. The TAM is their reachable segment at their current pricing model. Inflated TAM = they haven't done the bottoms-up.

✗ Customer data used for training without explicit opt-in. Regulatory liability and enterprise deal-killer. Enterprise customers at the major model providers are contractually firewalled — this should be standard.

✗ Open-ended "general agent" pitch with no production deployments. Narrow agents work now. General agents do not. If they can't name 5 specific task types their agent handles reliably, it's still a demo.

✗ NRR below 80%. Churn is outpacing expansion. In an AI product that's supposed to get stickier as it learns customer workflows, declining NRR is a signal the learning isn't happening.

✗ "AI" used 20+ times in the deck with no technical depth. Buzzword density inversely correlates with substance. Ask the technical question they haven't prepared for: "Show me your eval framework."