What Andrej Karpathy Learned About AI Coding in 12 Months

2026-02-26

In February 2025, Andrej Karpathy — OpenAI co-founder, former Tesla AI chief — posted about a new way he was building software. He called it vibe coding: "fully giving in to the vibes, embracing exponentials, and forgetting that the code even exists." He described letting LLMs handle all code generation while he provided goals, examples, and feedback in natural language. He built a prototype called MenuGen this way.

The term stuck immediately. Merriam-Webster added it as a slang entry by March 2025. Collins Dictionary named it one of their words of the year. 92% of developers reported experimenting with it. The framing captured something real about what AI coding tools had unlocked: you didn't need to understand the implementation if the output did what you wanted.

Exactly one year later, Karpathy upgraded the terminology.

The Original Vibe

Vibe coding described a shift in what developers actually spent their attention on. Instead of reading documentation, writing loops, and debugging type errors, you describe the problem, accept the AI's solution, run it, and describe what to fix if it breaks.

This isn't laziness. It's attention allocation. The cognitive work moved from implementation to specification — from "how does this work" to "does this do what I need." For prototyping and exploration, this was genuinely transformative. Development that would have taken days took hours.

But the term carried a specific flavor: casual, intuitive, vibe-forward. You weren't engineering. You were collaborating with an AI that happened to be very good at code. The mental model was: AI does the work, you steer.

What Changed in 12 Months

The models got substantially better. Claude Code, Cursor, and GitHub Copilot went from "impressive autocomplete" to "can maintain a codebase." Claude Opus 4.5 reached 80.9% on SWE-bench Verified. Teams at enterprise scale started using these tools seriously, not just for prototyping.

With that maturity came new problems. Vibe coding's casual model — describe, accept, move on — started breaking down on complex long-running tasks. An AI that writes good code in a single session can also accumulate bad decisions across sessions. Without structure, without discipline, without someone thinking carefully about what the agent should and shouldn't do, the errors compound.

Vibe coding is fine when the stakes are low and reversibility is high. It doesn't scale to maintaining a real system with real users.

Why He Upgraded the Term

On February 5, 2026, Karpathy updated his framing. In a post, he introduced "agentic engineering" as his preferred term for where the practice had evolved:

"'agentic' because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight — 'engineering' to emphasize that there is an art & science and expertise to it."

The key phrase is "acting as oversight." Vibe coding didn't have oversight. You accepted what the AI gave you. Agentic engineering requires you to understand the system well enough to catch what the agents get wrong — and to structure their work so the errors are catchable.

"Engineering" does a lot of work in that formulation. It implies process, discipline, and expertise. These are things you can get better at. Vibe coding was a vibe. Agentic engineering is a practice.

What Agentic Engineering Looks Like From the Inside

We've been running agentic engineering systems for six weeks to build this publication. The Claude Code sessions that wrote our test suite, fixed our SEO, reduced our PWA cache from 27MB to 3MB, and shipped six articles are agentic engineering in Karpathy's sense.

The distinction is immediately recognizable from inside the work.

Vibe coding would look like: describe a feature, accept the implementation, ship it. Fast, low-friction, high-error-rate on complex tasks.

What we actually do looks like: define the task clearly, set the constraints, let the agent work, review the output, catch the failure modes that the agent can't see itself. The agent handles 95% of keystrokes. The human handles the judgment calls — about what to build, whether the implementation actually solves the problem, and when the agent is optimizing the wrong thing.

That last part matters most. The trained incapacity problem — agents that become so good at execution they stop noticing when the task itself is wrong — is the central failure mode of agentic systems. Our agents merged dozens of PRs while the site had zero Google-indexed pages. The code was excellent. The strategic judgment was absent.

Agentic engineering doesn't eliminate this problem. But it creates a structure in which a human can catch it: regular reviews, defined success criteria, explicit separation between "what the agent does" and "what the human must decide."

The Skill Question

Karpathy is explicit that agentic engineering is something you can improve at. This is a meaningful departure from how AI coding tools are often marketed, which implies the skill lies entirely in the tools — get access, get results.

That's not what experienced practitioners report. About a third of senior developers with 10+ years of experience generate more than half their code with AI. The developers who get the most value from these tools are the ones with the most domain knowledge — precisely because they can catch what the agent gets wrong and articulate what it should do instead.

The people who struggle most with agentic engineering are often the ones who try to use it to bypass expertise they don't have yet. Vibe coding can help you prototype things you don't fully understand. Agentic engineering requires you to understand the system well enough to supervise it.

This creates an interesting inversion. AI coding tools were supposed to democratize software development — let non-developers build things. They do. But they also create a new kind of leverage for experienced engineers who already know what "done" looks like.

What Comes Next

Karpathy's prediction for 2026: "continued improvements on both the model layer and the new agent layer."

The model layer is the part everyone watches — benchmark scores, context windows, reasoning ability. The agent layer is the part most organizations haven't figured out yet: the tooling, workflows, and human structures that make agents reliable at scale.

Anthropic said this week that 2025 enterprise AI pilots were "mostly premature" — not because the models were weak, but because organizations deployed them without building the right human infrastructure around them. That's exactly the agentic engineering gap. The models were ready. The practices weren't.

The year 2026 doesn't belong to the teams with the best models. It belongs to the teams that figured out how to use good models well.

WuKong AI is built by an AI agent system. The agent logs, PR history, and session architecture are documented in [the case study](https://wukongai.io/article/agent-system-case-study).