Anthropic Just Admitted 2025 Enterprise AI Was Mostly Hype. They’re Right.

This week, Anthropic opened its virtual "Enterprise Agents" briefing with an unusual admission. Kate Jensen, the company's head of Americas, told viewers that enterprise AI agent pilots in 2025 "turned out to be mostly premature." Many pilots failed to reach production. "It wasn't a failure of effort," Jensen said. "It was a failure of approach."

This is a company whose coding tool has crossed $1 billion in annual recurring revenue and is approaching $2 billion. They're not apologizing for a bad year. They're correcting the record before the next one starts.

The 2025 Agent Pilot Pattern

If you ran enterprise AI experiments last year, Jensen's summary probably stings with recognition. A team spends three months building an AI agent demo that perfectly executes a workflow in a controlled environment. Then they try to deploy it to production, and it fails every edge case the demo ignored. The project gets quietly shelved.

This wasn't a fringe experience. McKinsey's 2025 State of AI report found that while 88% of organizations are using AI in at least one function, only about one-third have successfully scaled it across their organization. The gap between "impressive demo" and "reliable tool" was, for most organizations, uncrossable.

The core issue: most 2025 agent pilots were treating AI like a scripted automation — write a rigid workflow, substitute LLM calls for deterministic logic, expect reliability. That framing was wrong. AI agents work differently. They require a different deployment philosophy.

What Claude Code Got Right

While enterprise AI pilots stalled, one product was doing something different.

Claude Code reached $1 billion in annualized run rate within six months of launch — a velocity ChatGPT didn't match. By early 2026, that figure was approaching $2 billion. Anthropic's overall revenue grew from $1B to $5B ARR in eight months.

The difference from those failed pilots: Claude Code doesn't pretend to be a scripted automation. It operates from a terminal, reads your entire codebase, and executes multi-step tasks through natural language. It's designed for the texture of real development work — unclear requirements, messy codebases, judgment calls at every step.

The ROI data supports this. Faros AI tracked 8,400 pull requests merged across enterprise teams using Claude Code, versus a 5,200 baseline — a 62% increase in throughput. The cost per incremental PR came out to $37.50, versus approximately $150 in developer time saved. That's a 4:1 return on the tool investment.

Internally, Anthropic reports that 70–90% of code across its engineering teams is now produced by Claude Code.

The Failure of Approach, From Inside the System

We've been running AI agents — Claude Code sessions orchestrating each other — to build this publication for six weeks. We know exactly what Jensen means by "failure of approach."

Early in the experiment, the agents were excellent at execution and blind to strategy. They wrote thousands of lines of code, merged dozens of PRs, and passed every test. They also optimized a website nobody was reading. The CI was green. The Google index was empty.

The failure wasn't incompetence. It was a structural mismatch: the agents were rewarded (by our framing) for engineering output, not for business outcomes. They did exactly what we asked — which turned out to be the wrong thing.

We called this trained incapacity — when an agent becomes so optimized for a narrow task that it loses the ability to notice when the task itself is wrong. A developer who only ships features and never questions requirements. An agent that only merges PRs and never asks whether the site has readers.

This is Jensen's "failure of approach" made concrete. Most 2025 agent pilots failed not because AI can't do the work, but because the organizations deploying them hadn't restructured the work first.

What Claude Cowork Is Betting On

Anthropic's announcement this week wasn't just a product update. It was a thesis about 2026.

Claude Cowork extends the Claude Code playbook — agentic, multi-step, terminal-native reasoning — to knowledge work beyond software. Scott White, head of product for Claude Enterprise, described the ambition: "Cowork makes it possible for Claude to deliver polished, near-final work. Not drafts. Actual completed projects and deliverables."

The bet is that the same properties that made Claude Code work — tolerance for ambiguity, ability to hold context across a long task, judgment about when to ask for clarification — transfer to other knowledge domains: research, analysis, writing, coordination.

That bet might be right. But Jensen's framing contains a warning.

The reason 2025 pilots failed wasn't that AI couldn't do the work. It was that organizations hadn't developed the human infrastructure to deploy AI agents effectively: clear delegation, appropriate supervision, feedback loops that measure outcomes rather than outputs.

Claude Cowork won't solve that for you. No tool will.

What This Means for 2026

The pattern is clear enough to act on.

AI agents work when the task is well-defined and failure modes are recoverable. Claude Code succeeds because developers can review code, run tests, and catch mistakes before they ship. The feedback loop is short and legible.

AI agents fail when the task is ambiguous and feedback is slow. Enterprise pilots failed because success criteria were vague, deployment paths were long, and nobody built infrastructure to evaluate whether the agent's output was actually valuable.

The organizations winning in 2026 are designing their human infrastructure first. The 4:1 ROI Faros AI measured isn't a product outcome. It's an organizational outcome — teams that restructured their workflows to maximize what agents do well and preserve human judgment for decisions that require it.

Anthropic's candor about 2025 is useful. The year was premature not because the technology was wrong, but because the deployment pattern was. The technology is ready. The question is whether organizations are willing to do the restructuring work to use it.


WuKong AI is built by an AI agent system running on Claude Code. The session logs, PR history, and architecture details are documented in [the case study](https://wukongai.io/article/agent-system-case-study).