← Workshop Home

AI Investment Pitch — Question Card

The questions to ask before writing a cheque — built around the AI stack framework

N43 Studio • AI & Investing Workshop • April 2026

How to use this handout: Start at the top — identify the body plan, then run the Service-as-Software test. Those two steps alone will eliminate half the pitches you see. For the remainder, work through the categories most relevant to the deal. Not every question applies to every company — pick the 8–10 that cut deepest. The red flags section is your walk-away checklist. The benchmarks bar gives you the numbers to hold founders accountable to.
Step 1 — Identify the body plan & apply the SaaS test
🧬

Identify the Body Plan First

Where does this company sit in the AI Cambrian? The answer changes everything about the thesis.
Horizontal AI
Used across all industries & roles. Moat = distribution. Think: Cursor, Perplexity, M365 Copilot. Question: can an incumbent absorb this in one update?
Vertical AI
Built for one industry. Moat = domain knowledge + workflow integration. Think: Harvey (legal), Abridge (medical). Question: does the founder understand the workflow personally?
Agentic AI
Acts autonomously, multi-step. Moat = narrow domain + outcome pricing. Think: Sierra, 11x, Decagon. Question: is this a narrow task or an open-ended "do my job" agent?
Infra / Picks & Shovels
Sells to the AI builders themselves. Moat = scarcity + pricing power. Think: Vertiv, Schneider, AMAT. Question: does everyone in the layer above need this — regardless of who wins?
The stack lens: Energy → Data Centers → Semis → Cloud → Foundation Models → Horizontal → Vertical → Agentic. Scarcity increases as you go down the stack. Optionality increases as you go up. Know which direction this company's moat runs.
⚖️

The Service-as-Software 3-Question Test

SaaS TAM = $300B. Services TAM = $5T. One test tells you which you're investing in.
1
Does it price per outcome, or per seat?
Per outcome (per resolved case, per matter, per conversation) = pricing against the services budget. Per seat = still fighting for IT budget. One is a $5T market; the other is $300B.
2
Is the economic buyer the CIO — or the head of legal, CMO, or COO?
If the buyer is the CIO, you're in a software fight (capped TAM). If it's an operational or professional services owner, you're in a services fight — 10x the pool, longer contracts, higher NRR.
3
Can the customer measure the ROI in dollars and minutes saved?
Quantifiable ROI = near-zero churn even in a downturn. Fuzzy ROI = first budget line cut. Harvey measures hours-per-matter. Abridge measures minutes-per-encounter. If they can't name the metric, they don't have it.
Score: 3 yeses = genuine Service-as-Software business — 10x contract ceiling vs SaaS. 2 yeses = positioned to get there; probe the pricing evolution. 1 or 0 = SaaS wrapper with AI features. Still investable — different thesis, different ceiling, different multiple.
Step 2 — Dig into the business
⚙️

Technology & Differentiation

Separate real AI from a GPT wrapper in 5 questions
1
What model(s) are you using — and what happens if that provider raises prices 10×?
100% API-dependent with no fine-tuning = a feature, not a company. If OpenAI adds this to ChatGPT tomorrow, are you gone?
2
What happens if I give your product the same prompt through ChatGPT?
Forces them to articulate actual technical differentiation. If there's no clear answer, there's no real moat.
3
Show me a failure case. When does your AI get it wrong, and how often?
Honest teams know their hallucination rate and edge cases cold. Evasion on this question is a red flag. Ask for the error rate on their specific use case, not a generic benchmark.
4
Do you fine-tune, use RAG, or both? Where does your training data come from?
Proprietary data + domain-specific fine-tuning = defensible. Pure prompt engineering + public data = fragile. Licensed or first-party data = safe. Scraped = litigation exposure.
5
What's your cited-source or grounding strategy for high-stakes domains?
Legal and medical AI must be citation-verifiable. Ungrounded outputs won't survive enterprise procurement. Ask how they handle the hallucination-vs-utility tradeoff in their vertical.
📊

Unit Economics & Margins

Inference costs can silently destroy the cap table
1
What's your gross margin after inference / compute costs?
Traditional SaaS: 70–85%. AI-native structural ceiling: 50–65% if API-dependent. Ask for real COGS — many AI companies hide 40–60% compute in "cost of revenue" line items.
2
What does it cost to serve your median customer per month vs. your 90th-percentile power user?
Token usage variance is enormous. One power user can blow up unit economics for 20 median users. The shape of this distribution matters as much as the average.
3
Does your pricing model scale with your cost curve — or against it?
Per-seat pricing with usage-based costs = dangerous. As customers use more, you lose more margin. Per-outcome or consumption pricing = self-aligning. Which side of this are you on?
4
What's your CAC payback period — and does your model provider's pricing factor into that calculation?
CAC payback over 18 months in an API-dependent company = dangerous. Model costs may rise before you recoup. If they're on OpenAI/Anthropic APIs, they don't fully control their own COGS.
🔐

The Data Question (It's Not What You Think)

Data moats are being commoditized by synthetic data — ask the right version
1
Does using your product generate proprietary data that improves the model over time?
The flywheel: more users → more labeled output data → better model → more users. Without compounding data, there's no moat — just a feature. Ask for the feedback loop, not the dataset size.
2
Can your data moat be replicated with synthetic data within 12 months?
Synthetic data now achieves ~94% of real data quality at 60% of the cost. "We have 50M records" is not a moat if those records can be generated. What makes your data unreplicable?
3
Do you have legal rights to your training data? What's your exposure to copyright litigation?
Copyright litigation against AI training data is accelerating (NYT v. OpenAI being the template). Licensed or first-party data = safe. Web-scraped content = increasing liability.
4
If Anthropic or OpenAI shipped a competing vertical feature tomorrow, what would you still have?
The platform risk question. The honest answer should be: proprietary data pipeline, workflow integration depth, customer switching costs, and domain-specific accuracy that a generalist can't match in 6 months.
🔗

Stickiness & NRR

Retention separates the Cambrian survivors from the extinct body plans
1
What's your net revenue retention — and how does it compare to 118% enterprise SaaS benchmark?
AI-native SaaS median NRR is 48%; premium AI tools ($250+/month) are at 85%; strong enterprise = 110%+. Below 80% = churn outpacing expansion. Ask for GRR (gross) separately from NRR (net).
2
What does a customer lose if they churn tomorrow?
Custom fine-tuned models? Workflow history? Institutional knowledge? Integration depth? If the answer is "they just log out," switching cost is zero and NRR will crater eventually.
3
Show me your DAU/MAU ratio. How often do customers return?
Below 30% DAU/MAU = tool of convenience, not necessity. For agentic products, this metric is less meaningful — ask for tasks-completed-per-week instead. Frequency of use is the stickiness proxy.
4
For agentic companies: what's your resolved-task rate vs. human escalation rate?
Sierra targets 60-80% autonomous resolution. Below 40% = the agent isn't ready for production. Above 90% = either the tasks are too easy, or they're not measuring failures honestly.
👥

Team & Domain Credibility

AI talent + domain knowledge = the rarest combination
1
Does the founding team have domain expertise in the vertical they're targeting?
Harvey's founder quit as a litigator to build it. Abridge's CEO is a practicing cardiologist. AI engineers without domain knowledge build solutions looking for problems. This is the single most predictive team question for vertical AI.
2
Who on the team has shipped production AI at scale — not just prototypes or demos?
Demo ≠ production. Getting AI working for 100 beta users ≠ getting it working for 100,000 paying enterprise customers. Ask about production incidents, latency SLAs, and failover strategies.
3
Is the team diverse in background — technical depth and domain expertise represented?
Homogeneous teams (all ML engineers, no domain experts) are a red flag for vertical AI. You need someone who has sat in the chair the customer sits in. "We hired a consultant" doesn't count.
4
What's the team's relationship to their first 10 customers? Can I talk to two of them?
Reference calls are the highest-signal due diligence in early AI. Ask customers: "Would you work with them again?" and "What's broken?" The answers to the second question tell you more than any pitch deck.
🛡️

Risk, Privacy & Compliance

Regulatory velocity is accelerating — ask before the IC
1
Where does customer data go? Is any of it used to improve base models?
Zero data retention (ZDR) is table stakes for enterprise. Enterprise contracts at OpenAI/Anthropic are firewalled from base-model training — consumer products are not. Know which you're building on.
2
Are you classified as high-risk under the EU AI Act? How are you handling it?
High-risk systems (legal, medical, HR) face mandatory audits, conformity assessments, and fines up to 7% of global revenue. Effective August 2025. If they haven't budgeted compliance, add it to your risk model.
3
What's your hallucination rate on domain-specific tasks — not generic benchmarks?
MMLU and HellaSwag scores mean nothing for a legal AI tool. Ask for accuracy on their actual customer workflows. If they can't produce a domain-specific eval, they don't have one.
4
What happens to my data and model outputs if you run out of money?
90% of AI startups will fail. Data portability, model export rights, and escrow clauses should be standard in enterprise contracts. If they don't have answers to this, their enterprise customers haven't done real procurement.
Reference benchmarks — hold founders to these numbers

📐 2025–2026 AI Company Benchmarks

Gross Margin Target
60–70%
Premium AI tools ($250+/mo). API-dependent ceiling is structurally 50–65%. Below 50% = compute is eating the business.
NRR — Strong
110%+
Enterprise AI. Median AI-native is 48%; premium tools 85%. Below 80% = churn outpacing expansion.
Service-as-Software
10×
Contract size vs. equivalent SaaS. Harvey = $1–5M/firm/yr vs $300K for legacy legal SaaS. Same customer, 10–30× revenue.
Agentic Resolution Rate
60–80%
Autonomous task resolution target (Sierra benchmark). Below 40% = not production-ready. Above 90% = suspiciously narrow tasks.
ARR Ramp Speed
$100M
Service-as-Software cos reaching $100M ARR 4–5× faster than traditional vertical SaaS. Harvey: $0→$50M ARR in 18 months.
Vertical AI Moat
6–12 mo
Typical defensibility horizon before competitive parity (without active flywheel). Data quality > data quantity now.
Walk-away checklist

⚠️ Walk-Away Red Flags

Can't place themselves on the AI stack. If they don't know whether they're infrastructure, horizontal, vertical, or agentic, they don't have a thesis — they have a product.
Per-seat pricing with no roadmap to outcome pricing. Signals they're pricing against IT budgets, not services budgets. The TAM ceiling is 10× lower than it needs to be for the valuation.
No measurable ROI metric for customers. If the customer can't explain what it saved them in hours or dollars, churn is a question of when, not if.
100% API-dependent, no fine-tuning, no proprietary pipeline. A prompt wrapper. If OpenAI adds one feature, the company evaporates. This is the "Jasper AI" failure mode.
Demo works; production is unverified. Ask to run your own data through the system live. If they resist, that's the answer. Devin benchmarked at 14% real-world task completion despite a viral demo.
Gross margin below 50% after inference. Structurally dangerous. Compute costs will compress further at scale before they improve. Model provider pricing power over COGS is the hidden risk.
No domain expert on the founding team for a vertical play. AI engineers alone can't identify the right problem to solve in legal, medical, or finance. The founders should have sat in the customer's chair.
TAM = the entire industry. "Legal AI is a $500B market" is not a TAM. The TAM is their reachable segment at their current pricing model. Inflated TAM = they haven't done the bottoms-up.
Customer data used for training without explicit opt-in. Regulatory liability and enterprise deal-killer. Enterprise customers at the major model providers are contractually firewalled — this should be standard.
Open-ended "general agent" pitch with no production deployments. Narrow agents work now. General agents do not. If they can't name 5 specific task types their agent handles reliably, it's still a demo.
NRR below 80%. Churn is outpacing expansion. In an AI product that's supposed to get stickier as it learns customer workflows, declining NRR is a signal the learning isn't happening.
"AI" used 20+ times in the deck with no technical depth. Buzzword density inversely correlates with substance. Ask the technical question they haven't prepared for: "Show me your eval framework."
Quick scoring

★ Quick Scorecard — Rate 1–5 on Each Dimension  ·  Total under 18 = pass

🧬
Body Plan Clarity
Does it fit cleanly into horizontal, vertical, agentic, or infra?
⚖️
SaaS Test Score
3-question test: outcome pricing, services buyer, measurable ROI?
🔗
Workflow Depth
Copilot sitting beside a process, or system of record replacing it?
🔐
Data Compounding
Is there a real flywheel, or a static dataset that synthetics can replicate?
👥
Team Credibility
Domain expertise + technical depth + production track record?
📊
Unit Economics
Margins survive at 10× scale? Pricing model tracks the cost curve?