Blog · Engineering Practices

Test-driven development

Why good code often emerges through the test — and where the method actually pays off in real projects. A sober look beyond textbook idealism and tribal debates.

What TDD is — and why it asks for discipline

Test-driven development (TDD) inverts the usual order: write the test first, then the code that satisfies it. What sounds like a small detail changes the way software is built — and ultimately its architecture.

The method emerged in the late 1990s in the Extreme Programming community and was distilled into a discipline by Kent Beck in 2003 with Test-Driven Development By Example. Three rules, later codified by Robert C. Martin as the "Three Laws of TDD", form its core:

  1. Do not write any production code until a failing test exists.
  2. Do not write more of a test than is sufficient to make it fail.
  3. Do not write more production code than is sufficient to make the test pass.

At first glance, this feels pedantic. In practice it acts as a discipline anchor: anyone who sticks to it writes in small steps, in testable units, with clean contracts — automatically.

TDD is not the same as "writing tests". Tests after the code are a perfectly reasonable practice and nothing to apologise for — but they are not TDD. The difference lies in the role of the test: under test-first, the test is a design tool. Under test-after, it is a safety net. Both have value, but only the former changes what the code looks like.

The context has shifted in recent years. Continuous integration, continuous deployment and short release cycles demand tests that run in seconds and create trust in minutes. AI-assisted development — code completion, agents, pair-programming assistants — adds another lever: generated code is only as reliable as the tests it has to stand against. Writing tests first gives you a safety net designed for exactly this mode of production — regardless of whether the code came from a human or a model.

The red-green-refactor cycle

The cycle runs in three phases that follow each other quickly. A single iteration typically takes a few minutes — half an hour straight "in the green" means the rhythm is gone.

  1. RED — write the test that fails

    Describe the desired behaviour as a test. Not "the code I'm about to write", but "the behaviour the code should have". The test fails — good. The red bar confirms that the test is actually checking something. A test that passes from the start either checks nothing or checks the wrong thing.

  2. GREEN — the minimal code that makes the test pass

    Not the beautiful code, not the final one — the smallest one that turns the bar green. "Fake it till you make it" is explicitly allowed here: if a hard-coded return value suffices, it suffices. The temptation to write elegantly right away is strong and counter-productive. It leads to code that tries to do too much at once — and therefore does the wrong thing.

  3. REFACTOR — clean up without changing behaviour

    Only now is the code shaped: remove duplication, sharpen names, separate responsibilities. The tests are the safety net — they shout immediately if refactoring changes behaviour. Skipping this phase produces technical debt that the next cycle won't carry.

The cycle on a concrete example

A small function that formats a Euro amount in German style illustrates the flow. Start with the first failing test:

// RED — test fails because the implementation is missing
test('formats positive amounts with two decimals', () => {
  expect(formatPrice(12.5)).toBe('12,50 €');
});

The test fails — formatPrice doesn't exist yet. Now the minimal implementation, just enough code to turn the bar green:

// GREEN — minimal, no elegance yet
export function formatPrice(value: number): string {
  return value.toFixed(2).replace('.', ',') + ' €';
}

Tests green. Only now do we refactor — behaviour unchanged, but clearer to read:

// REFACTOR — improve readability, tests stay green
export function formatPrice(value: number): string {
  const formatted = value.toFixed(2).replace('.', ',');
  return `${formatted} €`;
}

The next cycle starts with the next behaviour: negative amounts, thousand separators, very large numbers, invalid inputs. Each new behaviour is a new test that turns red and then green — and after every green bar the refactor question is allowed: can this be written more clearly without touching the behaviour?

Practical note

The most common mistake in TDD trainings is to skip the refactor phase. Red-green alone yields working but dirty code — and the tests that get coupled to that dirty code later become a brake.

Benefits in practice

TDD doesn't deliver one single effect; it works along several dimensions. Four of them are reliably visible in real projects:

Testable architecture

Test-first forces decoupled modules with clear interfaces. Code that's hard to test simply never gets written that way.

Safe refactoring

A coverage level of 80–90 % falls out as a by-product. That opens the door for refactoring that nobody dares to attempt without tests.

Living documentation

Tests show how a component is meant to be used. New team members read them like examples — and when the API changes, the examples change with it.

Faster debugging

Bugs surface on the day they are introduced, not weeks later in production. The distance between cause and symptom shrinks to minutes.

The benefits are cumulative — the biggest effect appears only when all four interact. High coverage on tightly coupled code stays unmaintainable; a decoupled architecture without tests loses its structure in the first iteration. TDD is therefore not a detail-level optimisation but a different way of working with a different outcome.

A second, often underestimated effect is speed after three months. In month one, TDD is noticeably slower than "code first, maybe test later". In month two the pace levels out because debugging time drops. From month three onwards TDD is faster, because refactoring takes minutes instead of days — and that is when the investment finally compounds.

In regulated environments there's a further benefit beyond pure engineering: tests are durable audit artefacts. A test that states a calculation rule precisely and verifies it automatically answers an audit question more sharply than any prose specification — and it stays current, because it runs on every code change. Where the business, the regulator and engineering have to speak the same language, the test becomes the shared contract.

What TDD is not

Four misunderstandings hold on stubbornly and slow down teams that approach the method. Clearing them up is the prerequisite for any sober conversation about TDD.

  • TDD is not a guarantee of bug-free code. Tests check behaviour against expectations. If the expectation is wrong, the test is wrong too. TDD removes a class of mechanical defects — conceptual errors remain.
  • TDD is not "test everything". Coverage as an end in itself produces tests that check nothing but their own existence. The right question is "which behaviour matters?", not "which line is still uncovered?".
  • TDD does not replace architectural work. The big structural decisions — module boundaries, bounded contexts, data model — happen up front, not inside a test. TDD operates at the small scale; architecture at the large.
  • TDD does not work for every kind of code. UI animations, exploratory algorithmic research, one-off data migrations, throwaway prototypes — the cost-to-benefit ratio here is often poor. Discipline includes knowing when not to apply it.

Pitfalls in practice

Where TDD fails, it's rarely the method's fault. Five recurring problems explain most of the cases:

  • Brittle tests. Tests coupled to the implementation rather than the behaviour. Every refactoring breaks twenty tests — and the team starts seeing tests as a burden. The remedy: test through the public interface, not through implementation details. Use mocks sparingly, only at system boundaries.
  • Broken test pyramid. Too many slow integration tests, too few fast unit tests. The suite runs for 20 minutes — nobody starts it before committing. The remedy: aim deliberately for roughly 70 % unit, 20 % integration, 10 % end-to-end and defend that ratio.
  • Test-doubles chaos. Mocks, stubs, fakes and spies mixed wildly, every team member with their own preference. A short reminder helps: stubs return canned answers, fakes are working but simplified implementations (e.g. an in-memory database), mocks verify interactions (was this method called with these arguments?), spies record calls without replacing behaviour. The remedy: pick one library (Mockito, Sinon.js, NSubstitute) and a few shared conventions, enforced in code review.
  • TDD as dogma. When all you have is a hammer, everything looks like a nail. Even within one project there are areas where TDD carries and areas where it doesn't. Acknowledge both, make both explicit.
  • Discipline erosion under pressure. In the sprint endgame, tests are the first thing dropped. The remedy: anchor tests in the Definition of Done, refuse to merge code without tests — and in an emergency, cut scope rather than quality.
Practical note

Brittle tests are the most common reason teams abandon TDD. "We tried it, it doesn't work for us." In nine out of ten cases the method isn't the problem — test granularity is.

When TDD pays off — a pragmatic recommendation

TDD is not a universal answer; it is a tool with a clear profile. The honest way to handle it is to know when it pays off and when it doesn't.

Clearly worthwhile:

  • Business logic with clear contracts — calculations, validations, workflows, decision rules.
  • Regulatorily sensitive paths where auditors need to verify that a rule was implemented.
  • Long-lived platforms that will evolve over years and outlast multiple generations of developers.
  • High-complexity code where manual testing becomes disproportionate.

Hardly worthwhile:

  • Prototypes whose lifetime is measured in days.
  • UI code with a heavy visual component — end-to-end tests and visual snapshots are usually a better fit.
  • Research code whose outcome emerges only through iteration and whose requirements change daily.

The pragmatic middle path: TDD for core logic; classic tests as needed at the edges — glue code, adapters, boilerplate; prototypes without tests, but with a clear stop signal for the moment they cross into production. That transition is the very place where most debt is accumulated, and the place where discipline matters most.

Recommendation

In our projects TDD isn't the standard for every line of code, but it is the standard for every stretch of code that counts in an audit. We measure success not by coverage numbers but by two indicators: how long does it take to reproduce a bug, and how often does a refactoring fail because of the test suite instead of being protected by it? Once both answers are measured in minutes, the discipline has taken hold.

Code review or test-strategy workshop?

We work with your team on your test landscape — coverage profile, pyramid shape, brittle tests, CI runtimes — and translate the findings into a concrete action plan, tuned to the maturity of your platform.

Schedule a call