The Review Is Too Late

There is a conversation happening right now in engineering circles about code reviews. The argument goes roughly like this: AI is generating code faster than humans can review it, PR volume is exploding, and we need smarter tooling to keep up.

That argument is correct. And it is solving the wrong problem.

The deeper issue is not that reviews don’t scale. It is that we are still treating the review as the primary quality gate — a last line of defence before code enters the codebase. In the age of AI-assisted development, that assumption needs to be challenged at its root.

Review-as-safety-net is a relic. The question is not how to make it faster. The question is how much less we should need it.

Where Quality Actually Gets Decided

Think about what a code review is actually catching. Logical errors. Design drift. Missing edge cases. Inconsistencies with the rest of the codebase. Violations of conventions the team settled on six months ago.

Now ask: at what point in the development process were those problems introduced?

Not at merge time. Not during review. They were introduced when a developer — or an AI agent — started building without sufficient clarity about what should be built, how it should fit into the existing system, and what constraints apply.

The review catches problems that were created upstream. Which means improving the review does not fix the underlying cause — it just makes the catching slightly more efficient.

If we want to genuinely improve quality in a world where AI agents can generate hundreds of lines of code in minutes, we need to move the quality work earlier. Not just automate the review.

The Specification Problem Returns

There is an uncomfortable irony in how AI-assisted development is evolving.

Agile and TDD emerged partly as a reaction to Big Design Up Front — the painful experience of trying to specify everything completely before implementation began, only to discover that implementation always reveals what specification missed. The lesson was: let the design emerge incrementally, stay in close contact with the problem, trust small feedback loops over large upfront plans.

AI agents reward the opposite instinct. The more completely you define the task, the better the output. Vague prompts produce vague code. Incomplete specifications produce implementations that are locally coherent but globally wrong.

This creates a quiet pressure toward upfront completeness that Agile deliberately moved away from.

The answer is not to abandon specification rigour — it is to be deliberate about when and how that rigour is applied. A complete spec handed to an agent in one shot is BDUF with extra steps. The same rigour applied incrementally — one requirement at a time, one feedback loop at a time — is something different. It keeps the developer in genuine contact with the problem rather than delegating that contact entirely.

I wrote about this tension in detail in AI + TDD: A Shortcut to the Goal or a Loss of Insight?. The short version: TDD is not made obsolete by AI. Used deliberately, it is one of the best mechanisms we have for keeping specification and implementation in honest conversation with each other — even when an agent is doing the implementation.

Shifting Quality Left

So what does it look like to move quality work earlier in the pipeline?

A few things that have made a measurable difference in practice:

Clarify intent before the agent runs. The most expensive mistakes in AI-assisted development are not bad implementations — they are correct implementations of the wrong thing. Two minutes of explicit task definition, out loud, with another person, catches a surprising number of those. This is not review. It is pre-flight.

Incremental prompting over full-spec prompting. Instead of handing the agent a complete specification and reviewing the result, feed it one requirement at a time. This keeps the feedback loop tight and keeps the developer engaged with the design at each step rather than auditing a fait accompli.

Encode what should not change. Architecture tests, dependency rules, naming conventions, coverage thresholds — these can be expressed as automated guardrails that run before any human reviews anything. A pull request that violates a layering rule should never reach a reviewer. The guardrail catches it first. This is not a replacement for thoughtful review. It is a way of reserving human attention for the decisions that actually require it.

Own the refactoring. AI is good at the Green phase — writing the fastest code that makes a test pass. It is less reliable at the Refactor phase — asking whether this is the right abstraction, whether the naming reflects what the code actually does, whether the design will hold up as the system grows. That thinking belongs with the developer. Delegating it produces code that works today and confuses everyone in six months.

The Human Loop That Actually Matters

In Why We Code in Threes, I wrote about why we run sessions with two developers and one AI agent rather than one developer alone.

The reason that turned out to matter most was not the safety net — it was underspecification prevention. When one developer steers an agent alone, the task definition tends to stay implicit. Both people know what they mean, the agent produces something plausible, and the mismatch only becomes visible when it is expensive to fix.

Two developers talking through a task before the agent runs surfaces those ambiguities before they become code. It is a quality mechanism that operates upstream of the review — and in our experience, it has had more impact on output quality than any downstream check.

The conversation you have before the agent runs matters more than the review you do after.

What Reviews Are Still For

None of this means reviews disappear. They change shape.

When automated guardrails handle structural and stylistic consistency, and when incremental development keeps design decisions visible as they happen, reviews can focus on what they were always best at and rarely had time for: thinking about intent, questioning assumptions, considering whether this is actually the right solution to the right problem.

That is a more valuable review than line-by-line inspection. It is also more sustainable — because it does not pretend that human attention scales infinitely.

The goal is not to review more efficiently. It is to need the review less — and to use it better when we do.

The Real Shift

AI is changing software development faster than our processes are adapting. The instinct to scale up existing practices — more automated reviews, more tooling at the merge gate — is understandable. But it is an optimisation of the wrong thing.

The teams that build the best systems in this era will not be the ones with the fastest review pipelines. They will be the ones who moved quality work upstream: into specification discipline, into incremental development, into the conversations that happen before the agent runs.

Review is not dead. But the idea that it should be the primary quality gate — that it is where quality gets decided — that needs to go.

Quality is not found at review time. It is built in long before.

Where Quality Actually Gets Decided#

The Specification Problem Returns#

Shifting Quality Left#

The Human Loop That Actually Matters#

What Reviews Are Still For#

The Real Shift#