Product Thinking

Generating Code Is Easy. Dependable Software Is Not.

Bill Cava/June 23, 2026

You can generate a working app in an afternoon now. That part is real. Google reports that 41% of new code is AI-generated and 85% of developers use AI agents, more than half of them daily. The demo that used to take a sprint takes a prompt.

Then you try to run it for real, and the afternoon's work meets the 2am page. Reliability. Security. The edge case nobody prompted for. The data that has to stay consistent when two requests land at once. The demo proved the happy path. Dependable software is everything the happy path skips.

That gap is the whole story right now, and the most useful description of it comes from an unlikely place: Google's own paper on how AI changes the way software gets built.

What did AI actually make easy?

It made generation easy, not dependability. Those are different problems, and conflating them is the central confusion of this moment. Generating code is now fast and cheap. Making software you can stake a business on is still slow, still hard, and still mostly about judgment.

For years the two looked like one problem, because writing the code was the slow part and it sat in front of everything else. We wrote yesterday that the bottleneck has moved off of code. This is the same shift seen from the production side: once generation stops being the constraint, what's left exposed is everything generation was hiding. The fragile parts don't get easier. They just stop being hidden behind the slow part.

What separates vibe coding from dependable software?

Verification. Not prompt quality, not model choice, not how agentic your setup is. Google's 2026 paper The New SDLC With Vibe Coding, by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, frames a spectrum from vibe coding through structured AI assistance to full agentic engineering, and it puts the dividing line in one place.

The single biggest differentiator between the two ends is how outputs get verified.

Addy Osmani, Shubham Saboo & Sokratis Kartakis, Google, The New SDLC With Vibe Coding (2026)

Vibe coding asks one question: does it seem to work? Agentic engineering asks a harder one, and answers it with tests (deterministic checks that the code does what it should) and evals (checks built for non-deterministic AI output, where the same input can produce different results). The paper is blunt about the threshold: "Without both, the practice is always vibe coding, regardless of how sophisticated the prompts are."

The two halves do different jobs, and that's why you need both. A test pins down behavior that should never vary: given this input, the function returns exactly that, every time, or the build fails. That's the layer that catches a refactor quietly breaking a tax calculation. An eval handles the part of the system where the output is a model's, not a function's: it scores whether a generated summary stayed faithful to the source, whether an agent refused the request it should have refused, whether quality held across a hundred runs rather than the one you happened to watch. Generation gives you something that passes the demo. Tests prove the deterministic parts stay correct as the code changes, and evals prove the probabilistic parts stay acceptable across the variation the demo never showed you. Skip either and you're trusting a single lucky run.

That line reframes the whole debate. The question was never "do you use AI." A staff engineer with tests and evals and a casual prompt is doing engineering. Someone with an elaborate prompt chain and no verification is vibe coding, however impressive the output looks. The structure around the output is the craft. The output is just the output.

The paper makes the stakes concrete in a way worth repeating: telling a CTO that your team is vibe coding the payment processing system will, and should, raise alarm bells. Where you have to sit on the spectrum is set by what breaks if you're wrong.

Why is the gap suddenly so visible?

Because removing the generation bottleneck released a flood of fast, fragile software, and verification became the entire story. When code was slow to write, dependability problems were rationed by the same scarcity that rationed everything else. Now anyone can produce a plausible system in an afternoon, so the plausible-but-unreliable system is everywhere.

This is the production-side version of a pattern we keep running into. It's why vibe coding hits a wall: the wall is the distance between "looks done" and "actually done," and that distance is exactly the verification work. It's the same gap the AI production paradox measures from the deployment side, where the rollback happens precisely where verification was thin. None of these is a different problem. They're the same chasm, seen from three windows.

What's the real skill now?

Context engineering, paired with the judgment to verify. The paper is specific: "the quality of AI-generated code depends less on the cleverness of your prompts and more on the quality of the context provided." The lever isn't the magic phrase. It's what you put in front of the model, the constraints, the patterns, the rules of your system, and then what you do to check what comes back.

This is good news if you can actually engineer, and it's the engineering fundamentals that matter more in the AI era, not less. Context engineering amplifies whatever culture it runs on. A team with clear architecture, real test discipline, and well-defined system rules gets all of that amplified. A team without them gets its absence amplified just as fast. The tooling is concrete now: the CLAUDE.md and AGENTS.md rule files teams keep are exactly this, durable context that shapes every generation. The discipline is real, and it's nameable.

So the builder's move is simple to say and hard to do: treat the generated version as the start, not the finish. Budget for verification the way you used to budget for writing the code, because that's where the value moved. Tests, evals, and the context that makes generation trustworthy are now the majority of the work, which means they're now the majority of the value.

This is why we say AI can build anything, and the job is helping you build the right thing, dependably. Human and agentic collaboration isn't there to crank out more demos. It exists to cross the distance from demo to production, the distance that verification and judgment are the only things that close.

The hype said code was the hard part, so AI must have solved software. It didn't. It solved generation. Dependable software is still earned, the same way it always was, and now even Google's own paper says the new craft is in how you verify. The advantage goes to the people who were never just generating in the first place.

Frequently asked

What did AI actually make easy in software development?

›Generation. AI collapsed the cost of producing code, with Google reporting that 41% of new code is AI-generated and 85% of developers use AI agents.

⌄Generation. AI collapsed the cost of producing code, with Google reporting that 41% of new code is AI-generated and 85% of developers use AI agents. What it did not make easy is dependability: reliability, security, edge cases, and data integrity under real load. The demo is easy. The dependable system is the work.

What separates vibe coding from dependable software?

›Verification. Google's 2026 new-SDLC paper says the single biggest differentiator across the spectrum is how outputs get verified.

⌄Verification. Google's 2026 new-SDLC paper says the single biggest differentiator across the spectrum is how outputs get verified. Vibe coding asks 'does it seem to work.' Agentic engineering adds tests (deterministic checks) and evals (checks for non-deterministic AI output). Without both, the paper says, it is always vibe coding no matter how good the prompts are.

What is context engineering?

›Providing the model the right context to produce good output.

⌄Providing the model the right context to produce good output. Per Google's paper, the quality of AI-generated code depends less on the cleverness of your prompts and more on the quality of the context provided. Good engineering culture gets amplified by it, and so does bad. It is becoming the core skill, not prompt phrasing.

Does AI-generated code reduce the need for engineering judgment?

›No. It moves the work from typing to verifying.

⌄No. It moves the work from typing to verifying. When generation is cheap and abundant, the scarce, valuable skill is judging whether the output is correct, safe, and maintainable, and building the tests and evals that prove it. Generation got cheaper. Knowing whether the code is right did not.

Considered takes, in your inbox.

We write when we learn something worth sharing. No schedule, no marketing digests. Built for engineers and product owners shipping with agents.