Subscription Shock vs Usage Drift: The AI Cost Risk Most Companies Miss
A version of this argument has been moving through LinkedIn for two weeks, and several of our clients have asked us about it directly. The clearest articulation came in a post by Rachel De Sain, originally credited to James Martin: a fifty-person team running on Claude Pro pays a thousand dollars a month, but the same usage at API token rates would run fifteen to forty thousand. Melissa Rosenthal followed with the framing that every AI lab is losing money on every customer and the bill is coming. A few days later, State of Brand published "Every AI Subscription Is a Ticking Time Bomb for Enterprise" and formalized the argument with industry data.
The math is largely accurate. The conclusion needs sharpening.
What the discourse gets right: consumer subscription pricing is subsidized, and the gap between price and cost is going to close. What it gets wrong: that risk doesn't apply uniformly to every company building with AI. The pricing model that the panic posts are about is not the pricing model most AI products are built on, and the two are moving in opposite directions. This post is our considered, written-once answer for anyone who's been asking.
Is the AI subscription subsidy real?
The subsidy is real. The evidence has been accumulating for eighteen months.
In January 2025, Sam Altman acknowledged publicly that OpenAI was losing money on $200/month ChatGPT Pro subscriptions. The line that landed was, "people use it much more than we expected." Microsoft was reportedly absorbing roughly $80 of monthly compute on $10/month GitHub Copilot seats for power users (WSJ via The Register). GitHub announced earlier this year that Copilot is moving to usage-based billing on June 1, 2026, citing agentic workloads breaking the flat-rate model explicitly. Microsoft 365 has raised prices twice in four years, with the most recent round tied directly to AI infrastructure costs. Both Anthropic and OpenAI are reportedly preparing for IPOs, which brings public-market scrutiny to unit economics that have been venture-subsidized through the buildout.
We concede the diagnosis. Consumer and SMB flat-fee subscription pricing (twenty-dollar Pro tiers, hundred-to-two-hundred-dollar Max tiers, per-seat enterprise plans) is a loss leader. Heavy users consume multiples of what they pay. That gap is going to close. Companies whose AI strategy is "give every employee a Claude Pro or ChatGPT Plus seat" are exposed to a repricing event when it does.
How big is the gap? Bigger than the "10x" framing suggests. A heavy user on a $200/month Claude Max plan can pull through somewhere between $5,000 and $7,500 of inference at API rates, putting the effective subsidy at 25-40x for that population (community estimate via Theo Browne; Anthropic hasn't published official numbers). And the subsidy isn't being absorbed out of charity. The economic structure looks more like a marketing spend: heavy consumer-plan use creates developer habits that follow people into their day jobs, where the same workloads run on enterprise API contracts with no subsidy at all. The consumer side is funded by the enterprise side. That's why both Anthropic and OpenAI can sustain $200/month plans that lose money per heavy user. It's also why the gap is going to close from the consumer-plan end rather than the API end: that's the side where the unit economics don't survive scrutiny.
This is the honest part. The diagnosis is right as far as it goes.
Why subscription and API pricing are moving in opposite directions
The discourse treats AI cost as a single risk pool. It isn't. There are two distinct pricing models, and they are moving in opposite directions.
Consumer and SMB subscriptions (the panic-post category) are subsidized loss-leaders for market share. The gap between what the heaviest users consume and what they pay is real, and the closing of that gap is what the discourse is naming.
API token pricing is the other model, and it has been moving down, not up. Anthropic cut Opus pricing 67% at the Opus 4.5 launch (from $15 and $75 to $5 and $25 per million input and output tokens) and held the line through 4.6 and 4.7. Sonnet has been stable at $3 and $15. Haiku at $1 and $5. On top of those rates, prompt caching cuts cached input cost by up to 90%, and the Batch API is 50% off. A workload built thoughtfully can land 95% below the un-optimized baseline on the same problem.
Why the two move in opposite directions: API pricing is already much closer to true compute cost. The subsidy gap is concentrated in flat-fee subscriptions where one heavy user can outconsume their seat price by ten or twenty times. API customers pay for what they actually use, so the gap between price and cost is much smaller to begin with. The IPO scrutiny that's about to discipline subscription pricing has already disciplined API pricing.
The implication for builders: if your product is built on the API, the "subsidy collapse" risk applies to you in a fundamentally different way than to a company that handed out five hundred ChatGPT Plus seats. Conflating the two is the mistake the discourse is making.
What is usage drift in AI products?
There is a real risk for API-based products. It's just not the one being discussed.
Call it usage drift. Even with stable or declining per-token prices, the number of tokens consumed per unit of customer value is on a one-way trajectory upward. Five forces compound at once.
Agentic autonomy. A single user request can trigger 20 to 100+ inference calls instead of one question-and-answer turn. Claude Code users have reported burning a five-hour rate-limit window in under 90 minutes during heavy agentic sessions. The same "task" consumes orders of magnitude more inference than it did a year ago.
Tool-use loops. Function calls, web searches, file operations, MCP connectors. Each round trip adds tokens in both directions as the model ingests tool output and decides what to do next. A task that needs three tool calls is roughly three times the token count of one without them, before counting the iteration overhead.
Context inflation. RAG retrieval, conversation memory, file uploads, system prompts that have grown from two hundred tokens to five thousand and counting. Every call pays the input cost on whatever bloated context you ship with it. The cheapest token is the one you didn't send.
Reasoning tokens. Modern frontier models burn invisible "thinking" tokens before producing visible output. Those are billed as output tokens, which are five times the input rate. A complex query can consume 30,000 reasoning tokens before generating a 1,000-token answer the user actually sees. The bill counts them.
Capability creep. This is the quietest force and probably the biggest. As models get more capable, what your product asks them to do gets more substantial. "Summarize this document" becomes "summarize, draft a response, update the CRM, and schedule the follow-up." Each unit of perceived customer value consumes more inference than it did six months ago, even if you ship zero new code. Your users push the ceiling, and the model meets them there.
The implication for gross margin: a product that costs ten cents per active user per day to serve today may cost thirty cents to a dollar per user per day in twelve months. Not because token prices went up. Because the product got better.
That is the risk worth modeling. It is not the risk being discussed in public.
How to design AI products for stable gross margins
Designing for usage drift means assuming it's coming and engineering for it from day one. A few of the moves we lean on, none of them exotic:
Prompt caching as architecture, not optimization. Big system prompts, reference contexts, and retrieved documents get cached by default. Pay once, reuse many times. The cost savings are substantial. The bigger win is the discipline of designing for cache, because it forces real clarity about what context is actually reusable and what should be pruned.
Model routing by task. Haiku for classification, extraction, and routing. Sonnet for most production workloads. Opus only where it genuinely earns its keep. The cost ratio between Haiku and Opus is five-to-one. Routing is the single highest-leverage cost lever in most AI products, and most AI products don't use it.
Hard iteration caps on agents. A ceiling of 15 to 25 tool-use loops prevents runaway burn. If an agent can't solve a problem in 25 iterations, more iterations rarely help. The cap also forces the agent designer to be honest about when the agent is making progress versus thrashing. We wrote more about this in find the ceiling before you set the floor; the same posture applies to cost.
Token-budget telemetry from day one. Every product we build instruments cost per customer, cost per feature, cost per workflow. You cannot manage what you cannot see, and usage drift is invisible without instrumentation. By the time a billing surprise lands on a CFO's desk, the patterns are months old and the remediation is harder than it needed to be.
Vendor optionality. The prompt and tool layer is designed so the underlying model is swappable. This hedges against any single provider's pricing decisions and gives you negotiating leverage at scale. We don't bet a product on the assumption that any one lab's pricing curve is forever.
Forward unit economics. We model cost not at today's usage but at two times, five times, and ten times usage growth. The reasonable assumption for a product on a healthy growth trajectory is that token consumption per customer will rise faster than customer count. The gross margin has to survive the product getting better, not just survive the product getting bigger. That cost model is part of the product itself, decided upfront in the brief, not discovered later in a quarterly review.
The advantage is not exotic engineering. It's doing these moves before the bill shows you which ones you should have done.
Who is most exposed to AI repricing
The companies that will be blindsided by the AI repricing are the ones that treated consumer subscriptions as enterprise infrastructure. They issued seats by the hundreds and called that an AI strategy. They will pay for that decision when the subsidy closes.
The companies that will absorb the shift gracefully are the ones whose AI strategy is a product, built on the API, instrumented for cost, and engineered to keep its gross margin intact as the underlying models keep getting more capable. They are not betting that token prices will stay where they are. They are betting that the product will keep doing more per user, and designing the margin to survive it.
If you're building the second kind of company and you're not sure how exposed you actually are, that's a conversation worth having before your token consumption has it for you.
Update, 2026-05-15: Two days after this post went up, Anthropic announced exactly the kind of subscription reform we described. They're splitting Claude subscriptions into separate pools for interactive vs. programmatic usage, starting June 15, 2026. Follow-up post: Anthropic just raised the prices. Here's what it actually means.
Frequently asked
Is the AI subsidy really collapsing?›Consumer and SMB flat-fee subscription pricing is genuinely subsidized.
What is usage drift?›Usage drift is the phenomenon where the number of tokens consumed per unit of customer value rises over time, even when per-token prices are stable or declining.
How is API-based pricing different from subscription pricing?›API token pricing is already close to true compute cost. The subsidy gap is concentrated in flat-fee consumer subscriptions where one heavy user can outconsume their seat price by ten or twenty times.
How do you design AI products to absorb usage drift?›Treat prompt caching as architecture, not optimization. Route tasks to the cheapest model that can handle them (Haiku for classification, Sonnet for most work, Opus where it earns its keep).
Who is most exposed to AI repricing?›Companies that treated consumer subscriptions as enterprise infrastructure, issuing hundreds of ChatGPT Plus or Claude Pro seats and calling that an AI strategy.
Considered takes, in your inbox.
We write when we learn something worth sharing. No schedule, no marketing digests. Built for engineers and product owners shipping with agents.