Coding Was Never the Bottleneck

I once sped up a software development team by telling the engineers to stop coding. It was the most effective — and most unpopular — decision I've ever made.

The team was struggling to ship a release every two months. When they did get something out the door with a hard deadline, it required cherry-picking and rollbacks, and the shortcuts piled into the next release as debt. The engineers were talented. The code was good. And yet delivery was a disaster.

The problem wasn't the code. The problem was the rate at which code was being produced. Engineers were generating features far faster than QA could validate them. Work finished by developers stacked up untested for weeks. When the deadline hit, there was too much to test and not enough time to do it properly.

My intervention was simple and brutal: once engineers finished the stories committed for a sprint, they could not pull new work until all the tests were written and passing. No new code until the current code was verified.

The engineers were furious. Writing code is what engineers do — it's part of their identity. Being told to stop and help QA write test cases felt like a demotion. QA was equally unhappy about the intrusion. I paid for it in my upward reviews.

The results were undeniable. Within two sprints, features were shipping every two weeks like clockwork. We went from roughly ten features every two months to five features every two weeks — a throughput improvement of around five times. Not by writing more code. By stopping the code until the system could absorb what we'd already built.

Goldratt would have recognized the pattern immediately. The bottleneck was never the engineers. It was testing. And speeding up everything except the bottleneck just makes the bottleneck more painful.


I'm watching the same pattern play out now, at scale, across the industry.

Organizations have invested heavily in AI coding tools. They've done everything right — deployed agents, trained the teams, tracked adoption metrics. GitHub Copilot usage is up. Pull request volume is up. The dashboards look great.

And yet, at the end of the quarter, when leadership asks how many features shipped, the number hasn't moved much. Sometimes it's worse.

The math doesn't add up. If developers are writing code 55% faster — and research shows they are — why isn't that showing up as faster delivery?

Goldratt answered this question forty years ago. We just weren't paying attention.


Local Optima: The Trap That Looks Like Progress

In his Theory of Constraints, Goldratt identified one of the most persistent failure modes in complex systems: the local optimum. It looks like improvement. The metrics look better. But the system as a whole doesn't improve — and sometimes actively gets worse.

The principle is blunt: an hour saved at a non-bottleneck is a mirage. If you speed up a step that isn't the constraint, work piles up faster at the step that is. You haven't improved throughput. You've improved your ability to create a bigger backlog at the real bottleneck.

Niko Heikkilä captured it precisely in his analysis of AI-augmented development:

"Following Eli Goldratt's Theory of Constraints, AI-augmented programming is a perfect candidate for the illusion of local optima where we attempt to improve the total performance of the system by improving the performance of an individual cog only to find out it doesn't yield the expected results. Consider a team enthusiastically embracing AI coding assistance. Within a couple of weeks, the developers generate code at incredible velocity. However, their testing and review columns are piling up unfinished work. [...] Meanwhile, the UI/UX designers are struggling to keep pace and stress that the developers are moving too fast for them."

That description will be familiar to anyone who has watched an AI coding rollout from the sidelines.

The uncomfortable truth: coding was never the bottleneck.


Where the Time Actually Goes

Before we can fix the constraint, we have to name it honestly.

Programming — actually writing code — accounts for only 25% to 35% of the time from initial idea to product launch, according to Bain & Company's 2025 Technology Report. The rest is understanding user needs, clarifying requirements, designing solutions, aligning stakeholders, reviewing work, testing hypotheses, and navigating organizational friction.

These non-coding activities are precisely where the most significant delivery challenges emerge. Requirement ambiguity. Stakeholder misalignment. Integration issues. Handoffs between disciplines.

AI coding tools just made the minority of work — writing code — dramatically faster. The majority of work is untouched.

When you speed up coding without touching requirements and testing, something predictable happens: your developers finish features faster, and those features stack up waiting for the slower parts of the pipeline to process them. Sprint ceremonies consume the same amount of time relative to shortened task duration. Standups, planning sessions, and retrospectives don't compress when coding does. The bottleneck hasn't moved. It's just more visible now.


Drum-Buffer-Rope: Why Your Sprint Length Was Never a Choice

Most agile teams think of sprint length as a planning decision — a deliberate choice balancing feedback frequency against stability. Two weeks feels right. Maybe three. You've always done it this way.

It wasn't a choice. It was a constraint artifact.

Goldratt's Drum-Buffer-Rope model describes how throughput in any system is governed by its slowest step — the drum. Work intake (the rope) should be tied to the drum's pace. The buffer exists to keep the drum from going idle.

In software delivery, the drum is the slowest step that blocks downstream work. Before AI, that was often a combination: planning required human synthesis of requirements, coding required human implementation, and testing required human verification. Sprint length emerged as the minimum viable batch size to make the overhead of ceremonies economically worth it.

Here's the math that explains why two-week sprints became standard:

A 10-day sprint with 1.5 days of ceremony overhead is roughly 13% overhead. Painful but manageable. Shrink to a 2-day sprint with the same overhead? You're now spending 75% of your time in meetings. The model breaks.

Sprint length wasn't chosen. It was the shortest cycle where the overhead ratio was still viable.

AI changes this calculation. Not because it speeds up coding — we've already established that coding wasn't the drum. AI changes it because it can also compress the slow steps: requirements elaboration and test generation.

When the drum speeds up, the rope pulls smaller batches. Sprint length compresses — not arbitrarily, but because the overhead ratio is now viable at smaller batch sizes.


What It Looks Like When You Attack the Real Bottleneck

AWS has published a methodology called the AI-Driven Development Lifecycle (AI-DLC) that describes exactly what this looks like in practice. It replaces traditional sprints with "bolts" — shorter, more intense work cycles measured in hours or days rather than weeks.

The methodology centers on two practices that directly attack the drum:

Mob Elaboration — The entire team, with AI assistance, collaborates to transform business intent into detailed requirements, user stories, and acceptance criteria. What previously consumed days of back-and-forth between product owners, architects, and developers is compressed into a focused session. AI drafts, the team validates, clarifies, and redirects in real time.

Mob Construction — Architecture, code, and tests emerge from a structured human-AI collaboration where the team actively validates AI proposals rather than passively reviewing code after the fact.

The result AWS describes:

"AI rapidly generates and refines artifacts, such as requirements, stories, designs, code, and tests allowing product owners, architects, and developers to complete tasks in hours or days that previously took weeks."

Amazon Kiro takes a similar approach — their agent can take a feature request, unpack it into user stories with acceptance criteria, produce design documents, generate implementation tasks, write unit tests, and update documentation automatically. The drum isn't faster because code writes itself. The drum is faster because requirements elaboration and test scaffolding — previously slow, labor-intensive steps — have been compressed by AI.

Practitioner research confirms what the theory predicts: one team ran ten one-day sprint cycles over two weeks as an experiment. It "forced the team to figure out critical skills and practices like finding small slices of value and collaborating — skills you can easily ignore in a 2-week sprint." Shorter cycles, more learning per unit of time — exactly what Goldratt would predict when you remove batch-size constraints.

Agile coaches have observed the same principle from the practice side: a team's effectiveness correlates more with the number of iterations completed than their duration. More cycles means faster feedback means better decisions. The sprint length was always holding back the learning rate.


The 24/7 Agent Problem

There's a sharper version of this problem that architects and technical leaders should be thinking about now.

AI coding agents don't sleep. They don't context-switch. They don't get pulled into incident calls. A well-configured agent can generate, review, and iterate on code around the clock.

If your team runs agents 24/7 but your requirements process still batches work into two-week planning cycles — you're feeding a fire hose into a funnel. The agents are waiting. Features are piling up in various states of partial completion. The value isn't reaching customers any faster because the bottleneck is still upstream.

Meanwhile, your customers are increasingly aware that they don't have to wait. A technically sophisticated user can take their requirements directly to Claude or GPT, describe what they need, and have a working proof of concept in an afternoon. Not production-ready software — but often good enough to solve the problem they were waiting six months for you to solve.

This is the customer-as-competitor dynamic that architects need to take seriously. It's not hypothetical anymore. The window in which your speed advantage over a customer's self-service option exists is narrowing. Every month you spend optimizing code generation while leaving requirements and testing as manual batch processes is a month where that window closes further.


What Actually Needs to Change

The path forward isn't complicated, but it requires an honest reckoning with where the real work is happening.

Audit the value stream, not the code metrics. Pull request volume and code generation speed are local optima metrics. The measure that matters is cycle time from business intent to value delivered to a customer. Map the full stream and find where work actually waits.

Apply Goldratt's four questions to your delivery process. Where is the real constraint in your system — not the step you most recently improved, but the step that limits total throughput? Most teams will find it in requirements clarity, testing capacity, or deployment friction — not in the speed of writing code.

Redesign ceremonies for the drum speed you actually have now. If AI has genuinely compressed your planning and testing cycles, your sprint length should reflect that. Running two-week sprints with AI-compressed overhead is the equivalent of the MRP companies that got overnight reports and continued to wait a month to act on them.

Evaluate structured methodologies like AI-DLC. AWS's published approach isn't experimental — it's a documented, enterprise-scale methodology for restructuring the entire development lifecycle around AI capabilities. The Mob Elaboration pattern in particular directly addresses the requirements bottleneck that most teams are ignoring.


The Constraint Is Visible Now

The teams that see no velocity gains from AI coding adoption aren't doing it wrong. They're doing it locally right — they've genuinely improved the coding step. But a local optimum is still a local optimum.

Goldratt's observation about MRP systems in the 1980s holds: the companies that captured order-of-magnitude gains weren't the ones who ran better reports. They were the ones who changed how they made decisions when better information became available.

AI hasn't made coding faster. It's made the constraint visible.

Your agents can run overnight. The question is whether you'll change the process to match.


About Brad Jolicoeur

Principal Architect with 20+ years building and transforming engineering organizations. Wharton Executive CTO Program graduate. Writing about architecture, distributed systems, production AI, and engineering leadership.

Get in touch →   More articles →

You May Also Like


Fast Code, Stuck Value

feature-dollar-days.png
Brad Jolicoeur - 05/26/2026
Read

Your AI Coding Agents Aren't Slow. Your Process Is.

agent-process-constraint.png
Brad Jolicoeur - 05/24/2026
Read

Why Your ETL Can't Just Read My Database

contracts-to-warehouse.png
Brad Jolicoeur - 04/29/2026
Read