Test-Driven Development has always been slightly misunderstood — even by people who practice it.

The name doesn’t help. “Test-Driven” sounds like it’s primarily about tests. Coverage metrics. Regression safety. The QA team’s peace of mind. But anyone who has worked seriously with TDD, or spent time with practitioners like Emily Bache, knows that tests are almost a side effect. The real output is understanding.

TDD, done well, is a method for thinking your way through a problem one small step at a time. You don’t start with a complete picture of the solution. You start with the smallest possible question: what is the simplest behavior this code should exhibit? You write a test for that. You make it pass. And in the process of making it pass, you learn something — about the problem, about your assumptions, about the design that is quietly trying to emerge.

That learning is the point. And it’s exactly what Agentic AI puts at risk.

The Temptation of the Full Spec

Here’s the scenario that’s becoming increasingly common: a developer has a reasonably well-defined feature to implement. They spend time writing a thorough specification — inputs, outputs, edge cases, constraints. They hand it to an AI agent. Twenty minutes later, they have a complete implementation with a full test suite.

From a pure throughput perspective, this looks like a win. Weeks of work compressed into an afternoon.

But look at what happened to the TDD cycle. In classic TDD, those twenty minutes of AI output would have been ten or fifteen cycles of: write a failing test, write the minimal code to pass it, notice what the design is telling you, refactor, repeat. Each cycle is a small conversation between you and the problem. Each refactor is a moment where you ask: is this the right abstraction? Does this name capture what I actually mean? Would I understand this in six months?

When you hand a complete specification to an AI and accept the output, you skip every one of those conversations. The code may be correct. The tests may pass. But the understanding that TDD was supposed to generate — the deep familiarity with the design space, the sense of which parts are fragile and why — that understanding didn’t happen. Not in you.

The Ghost of BDUF

There’s a more structural problem lurking here, one that Emily Bache has pointed to directly: if AI-assisted development requires a complete specification before the first line of code is written, we have quietly drifted back toward Big Design Up Front.

BDUF was the dominant paradigm before Agile. The idea was that if you specified everything carefully enough at the start, implementation would be straightforward. Decades of painful experience taught us that this doesn’t work — not because developers are bad at implementing specifications, but because the act of implementation always reveals things that the specification missed. Reality is more complex than the model. The edge cases multiply. The requirements were underspecified in ways nobody noticed until someone tried to actually build the thing.

TDD was, in part, a response to BDUF. It said: don’t try to design everything upfront. Let the design emerge through the discipline of small, concrete steps. Trust the process.

Agentic AI, used naively, inverts this lesson. It rewards completeness of specification. The more you define upfront, the better the output. Which means the pressure is back on: think everything through before you start. Anticipate every edge case. Leave nothing ambiguous.

The irony is that this is hardest precisely when it matters most — on novel, complex problems where the full shape of the solution isn’t yet knowable.

Three Ways to Keep the Discovery Alive

I don’t think the answer is to avoid AI in TDD workflows. The efficiency gains are real and the tooling is only getting better. But I do think we need to be deliberate about preserving the parts of TDD that AI tends to bypass. Here’s how I approach it:

Incremental prompting, not full-spec prompting. Instead of handing the AI a complete specification, feed it one requirement at a time — the same way you would in a manual TDD cycle. Ask it to implement only enough to pass the current test, then stop. This maintains the iterative rhythm and forces you to stay engaged with the design at each step rather than reviewing a fait accompli.

AI as Socratic partner, not answer machine. Before asking the AI to write code, ask it to help you think. What edge cases am I not considering? What would break this design? What is the next test case that would reveal a flaw in my current implementation? This kind of prompting sharpens your thinking rather than bypassing it. The AI becomes the rubber duck that occasionally pushes back.

Own the refactoring. In the classic Red-Green-Refactor cycle, the Green phase — writing the quickest code to make the test pass — is almost mechanical. This is where AI genuinely excels and where delegating makes sense. But the Refactor phase is where the design thinking happens. Keep that in your hands. Look at what the AI produced, make it pass, and then ask yourself: what does this code want to become? Don’t let the AI answer that question for you.

The Maintainability Problem Nobody Is Talking About

There’s a longer-term concern that I think deserves more attention than it currently gets.

AI-generated code tends to be locally correct but globally unaware. It handles the specified requirements well. It doesn’t know about the architectural decisions made six months ago, the naming conventions your team settled on after a long debate, the subtle reason why a particular abstraction was deliberately avoided in this part of the codebase.

When this code accumulates — when a significant portion of a system is AI-generated without careful human curation — you can end up with a codebase that is technically functional but cognitively opaque. Every piece works. The overall structure makes sense to no one in particular.

TDD, practiced with discipline, is one of the best defenses against this. Tests document intent. The refactoring cycle creates coherence. The incremental design process produces systems that reflect the actual shape of the problem rather than the shape of the specification someone wrote before they fully understood the problem.

Used well, AI can accelerate each step of that process. Used carelessly, it bypasses the process entirely — and the technical debt that results is particularly hard to address, because the understanding that would normally accumulate during development never did.

More Necessary Than Ever

My answer to the question I raised on LinkedIn is this: TDD is not made obsolete by AI. It becomes more important.

Not as a testing technique — automated test generation will only get better, and that part of TDD’s value proposition will increasingly be commoditized. But as a thinking discipline. As a way of staying in genuine contact with the problem you’re solving rather than delegating that contact to a model that doesn’t know what you don’t know.

The developers who will build the best systems in an AI-assisted world won’t be the ones who use AI most aggressively. They’ll be the ones who use it most deliberately — who know when to let it run and when to insist on doing the thinking themselves.

The insight that TDD was always trying to develop is now more valuable than ever. The question is whether we’ll protect the conditions that allow it to happen.


This post is part of an ongoing series exploring the intersection of engineering practice and the changing landscape of AI tooling.