How AI Changes Software Delivery
A field-engineering take on which parts of the SDLC actually change with AI in the loop — and the parts that look like they should change but don't.
9 min read
The default conversation about AI and software delivery is breathless and binary: everything is different or nothing is. Neither has matched what we actually see in the field. The honest answer is more interesting: a handful of stages collapse dramatically, a handful change shape, and a handful are stubbornly the same.
The three categories
- Step 1CollapsesHours → minutes
- Step 2ReshapesSame work, new artifacts
- Step 3UnchangedStubbornly human
Collapses
These are the parts where the time-to-output drops by an order of magnitude:
- Boilerplate scaffolding. Setting up a Next.js app with auth, a DB layer, and a deploy pipeline used to be a day. It's now a half-hour and most of that is choosing.
- First-draft documentation. API docs, READMEs, architecture overviews — the first draft is essentially free. The work moves to editing.
- Test scaffolding. Generating the test surface around a module — the cases you'd write if you had unlimited patience — is now reasonable to actually do.
- Code archaeology. "What does this 4-year-old service do?" used to be a multi-day spelunk. With a capable agent and the source tree, it's an afternoon.
Reshapes
These look the same on the surface but the artifact you produce is different:
- PRDs become executable. A PRD used to be input to a human roadmap. It's increasingly input to an agent's plan. That changes how you write it — tighter constraints, explicit non-goals, fewer adjectives.
- Code review shifts upstream. The valuable review moment is no longer "is this implementation correct?" — the agent's implementation usually compiles. It's "is this the right thing to implement?" Reviews happen on plans, prompts, and contracts more than diffs.
- Pair programming becomes pair-with-the-agent-then-pair-with-the-human. You catch the obvious issues with the model in the loop; you save the hard tradeoffs for the human conversation.
- Onboarding compresses. A new hire with Claude Code and a decent
CLAUDE.mdis productive in days instead of weeks. But what "productive" means is different — they're shipping with scaffolding, not necessarily with deep mental model yet.
Unchanged
These are the parts that look ripe for disruption but in practice resist it:
- Distributed systems debugging. The hard part was never "what does this stack trace mean." It was reasoning about state across machines, time, and partial failure. Agents help at the margins but don't change the shape of the work.
- Cross-team alignment. Getting four orgs to agree on an API contract is a political problem, not a generation problem.
- Incident command. The blameless culture, the calm voice on the call, the decision to roll back vs. patch forward — all human.
- Taste. Knowing which of three plausible designs is the right one for this team, this quarter, this codebase — still the senior engineer's job.
What this means for how teams should change
A few load-bearing implications, in rough order of leverage:
- Hire and train for taste, not throughput. The differential value of "good judgment about what to build" has gone up. The differential value of "fast at writing the loop" has gone down.
- Invest in your prompts and your CLAUDE.md files like infrastructure. They are infrastructure. Treat them with the same version control, review, and ownership you'd give a service.
- Move review upstream. The team that catches problems at the plan stage will outship the team that catches them at the PR stage, ten times over.
- Pick the agent-friendly stack. Mainstream frameworks with good docs, conventional patterns, and clear file layouts get materially better agent output than exotic ones. This is a real cost-of-novelty.
- Build the demo culture. Internal demos — five-minute walkthroughs of how someone solved something with the agent — become the most important knowledge transfer mechanism. They scale where the conference talk doesn't.
A closing thought
The framing question I keep coming back to: what work were we previously not doing because it wasn't worth the time, that we should do now because it is? The teams that answer that well will quietly out-execute the teams asking the wrong question — "how do we replace headcount?" — by a margin that compounds every quarter.
This isn't a productivity story. It's a portfolio-of-bets story. AI didn't make engineering cheaper. It made the work that used to be uneconomical, economical.