AI-Native Methodology

The Big Four Didn't Standardize on a Model. They Standardized on Governance Posture.

Bill Cava/

On May 19, the CEO of KPMG announced that 276,000 people across 138 countries would start working with Anthropic's Claude through a new platform called the KPMG Digital Gateway. The deal anchors the largest single AI rollout the firm has ever made. The first sentence Bill Thomas put on the record was not about capability. It was not about productivity. It was about something else.

Security, trust, governance rather than speed alone.

Bill Thomas, KPMG Global Chairman & CEO, May 19, 2026

The CEO of a Big Four firm picked the framing himself. He led with governance. That sentence is the news.

What did the Big Four actually buy?

In eight days between May 14 and May 21, three of the Big Four committed to permanent foundation-model platforms. The clustering is the first thing to notice. The framing is the second.

PwC went first on May 14, expanding its Anthropic partnership to roll Claude Code and Cowork across its 364,000-person workforce, with a Claude-native Office of the CFO as the first at-scale Claude-native business unit at a Big Four firm. KPMG followed on May 19 with the 276,000-employee, 138-country deal anchored on the KPMG Digital Gateway, full implementation September 2026, starting in tax and legal. EY closed the cluster on May 21 with a $1 billion-plus, five-year Microsoft extension scaling Microsoft 365 Copilot to 400,000+ EY people through the E7 Frontier Suite. Combine those with Deloitte's October 2025 Anthropic baseline and you arrive at roughly 1.17 million professionals now standing on a single foundation-model platform per firm.

The trade press read this as enterprise AI arriving. Fortune ran "Big Four consulting has 2 AI nightmares. KPMG's answer to both is the same." Reuters, Capital Brief, and most LinkedIn commentary all landed in the same place: validation that the AI productivity story has crossed into the audit-and-advisory mainstream. If you stop reading at the seat counts, the consensus holds.

Read the announcements again with one different question. Not what did they buy, but what did they buy on behalf of whom. The Big Four audit, advise, and underwrite the Fortune 500. Their reputations are made of one substance only: the trust their signature confers on a piece of work. A signature is a governance artifact. So when KPMG's CEO names governance first, that is not a marketing sentence. It is a precise description of what KPMG just bought.

Why governance first, and not capability?

The Big Four did not pick the most capable model because there is no capability winner to pick. Stanford HAI's 2026 AI Index Report ran a new accuracy benchmark across 26 leading foundation models and found hallucination rates from 22% to 94%. The same report shows GPT-4o accuracy dropping from 98.2% to 64.4% under the more rigorous test. DeepSeek R1 collapses from over 90% to 14.4%. Seventy-four percent of enterprise respondents now cite inaccuracy as their top AI risk, up 14 points year over year. Even the best models produce inaccurate outputs roughly one in five times. There is no capability standardization available in that field. There is only governance standardization. (The enterprise rollback data we covered last week tells the same story from the deployment side.)

What KPMG, PwC, and EY bought is a defensible posture: one vendor per firm, one documented Center of Excellence, one Microsoft Azure or equivalent deployment path with the firm's name on the governance framework, and a quote from the CEO on the record about trust. The deals are insurance, not innovation. If a tax or legal deployment goes sideways six months from now, the answer is not we picked the best model. The answer is we picked the model that comes with this governance contract. The audit firms know the model. They wrote it.

Ethan Mollick, who is paid to think about exactly this, called the lab-consulting move "weird" at the Sana AI Summit, noting it contradicts the value proposition of generative AI. He is right and he is wrong in interesting ways. He is right that if generative AI worked the way the vendors say it works, the Big Four would not need to certify tens of thousands of consultants on a specific model to deploy it. He is wrong that this is a contradiction. It is consistency. The labs are not selling a tool. They are selling a tool plus the humans who know how to govern it. The Big Four are buying the second half.

The reliability story has a second piece, and it is the one the deals do not honor. METR's May 9, 2026 update on long-horizon agent tasks places the 50%-success time horizon at "likely at least 16 hours," with an explicit caveat: "measurements above 16 hours are unreliable with our current task suite." That is the most honest sentence in the agent-time-horizon literature this year. The Big Four are deploying Claude into client engagements that take weeks. The METR caveat is the exact part of the capability claim the deals do not honor. The governance framework is what gets built across that gap.

What does this mean for the founder in the regulated room?

The founder building a regulated product is going to encounter the Big Four. At the audit. At the controls review. At the SOC 2 gate. At the M&A diligence. The consultant in the room will arrive with model-specific tooling, model-specific certifications, and a governance framework tuned to one vendor. The model choice is now upstream of the build whether the founder wants it to be or not.

I wrote about the frontier labs becoming consultancies five days ago. $6.25 billion across OpenAI, Anthropic, and Google in 30 days, all of it funding forward-deployed engineers inside customer organizations. That post argued the labs read the same enterprise rollback data everyone read and concluded the model layer alone does not ship outcomes. The Big Four cluster is the inverse half of the same story. The labs are becoming the customer-side distribution layer. The consultancies are becoming the labs' distribution rails. Both sides are voting against the build-with-the-API-alone thesis with billions of dollars. The shape of what both sides are buying is the same: humans embedded in the build, with a governance contract around them.

Three of the Big Four picked Claude. One picked Copilot. The convergence is the obvious story. The divergence is the interesting one. EY going Microsoft is not a hedge. It is a different read of the same problem: ecosystem-breadth over workflow-depth, the Microsoft 365 install base over the Claude-native build, governance through a vendor whose compliance posture is older than the AI. Both bets are governance-first. Both bets flatten the path between the domain expert and the build. The closest-to-the-problem framework predicts a specific consequence: when the path between the expert and the build is mediated by a fixed model carried by 1.17 million consultants, the products on the other end will look more like each other than the underlying problems would predict. AI amplifies your direction. If the direction is make this fit our governance posture, that is what the product becomes.

The honest scope: the Big Four model is not the new way of building. It is the old way of building with a new mediator. That is a coherent product for a regulated enterprise at scale. It is not the product a founder closest to a specific domain problem wants. Both can be true. The relevant question is which one the founder is trying to build. If the answer is the second one, the model standardization happening this month is the news. The path the founder needs is different from the path the consultant just standardized.

What changes for the work itself?

KPMG's CEO put governance first. He was telling the truth. The Big Four are not betting on AI capability. They are betting on AI accountability posture. The 22-94% spread is the operational reality underneath. The 1.17 million-consultant distribution layer is the mechanism. The map is the news. And the map says: enterprise AI has not arrived. Enterprise AI insurance has.

Frequently asked

Which AI model did the Big Four pick?
04 million seats combined). EY went a different direction with a $1 billion+ Microsoft extension scaling Microsoft 365 Copilot to 400,000+ employees via the E7 Frontier Suite.
PwC, KPMG, and Deloitte standardized on Anthropic's Claude (roughly 1.04 million seats combined). EY went a different direction with a $1 billion+ Microsoft extension scaling Microsoft 365 Copilot to 400,000+ employees via the E7 Frontier Suite.
Why did KPMG choose Claude over Copilot?
KPMG framed the decision around governance posture, not capability.
KPMG framed the decision around governance posture, not capability. CEO Bill Thomas led with 'security, trust, governance rather than speed alone.' What KPMG bought was a single-vendor governance contract its auditors can defend, not the most capable foundation model on a benchmark.
What does the Big Four AI standardization mean for clients?
The consultant in the room now arrives with model-specific tooling and a governance framework tuned to one vendor.
The consultant in the room now arrives with model-specific tooling and a governance framework tuned to one vendor. The model choice is upstream of the build whether the client wants it to be or not. For founders closest to a specific domain problem, this flattens the path between expertise and product.
Are AI hallucination rates a problem for Big Four deployments?
Stanford HAI's 2026 AI Index Report found hallucination rates ranging from 22% to 94% across 26 leading foundation models.
Stanford HAI's 2026 AI Index Report found hallucination rates ranging from 22% to 94% across 26 leading foundation models. There is no capability winner. The Big Four did not standardize on accuracy. They standardized on a defensible governance posture that holds even when the underlying model is wrong.
Why are AI labs and consulting firms competing now?
25B on forward-deployed engineering in 30 days. 17M consultants.
The frontier labs spent $6.25B on forward-deployed engineering in 30 days. The Big Four spent on locking in lab platforms across 1.17M consultants. Both sides read the same enterprise rollback data and concluded the model layer alone does not ship outcomes. The two clusters are inverse halves of the same story.
Subscribe

Considered takes, in your inbox.

We write when we learn something worth sharing. No schedule, no marketing digests. Built for engineers and product owners shipping with agents.

~1 email/wk · Unsubscribe anytime