What I learned about AI-assisted development fom building a production service in 83 days

After 14 years of writing code professionally, I thought I had a pretty good handle on what “moving fast” meant in software development. I’ve shipped features under impossible deadlines, refactored legacy systems while they were running in production, and led teams through the chaos of hypergrowth. I thought I’d seen it all.
Then last year, I was part of a project that completely rewired my understanding of what’s possible when you architect a system specifically for AI-assisted development.
I want to share what we learned taking a distributed inference platform from zero to production in just 83 days with a team of 9 engineers. This isn’t a thought experiment—it’s a real system handling millions of requests, and the lessons have fundamentally changed how I approach building software.
How We Got Here
It started, as many projects do, with firefighting.
I was brought in to help diagnose what everyone assumed was a capacity problem in an existing inference service. The system was struggling under load, latency was spiking unpredictably, and the ops team was getting buried in alerts. Classic scaling issues, right?
After two weeks of digging through metrics and traces, I realized capacity wasn’t the problem. The architecture was.
The original system had been designed like a traditional web service—the kind where you can assume requests are roughly uniform and load balancers can distribute work evenly. But inference workloads are fundamentally different. One request might complete in 200 milliseconds. Another might run for 45 minutes. Try load balancing that.
We explored several options: rearchitecting the routing layer, implementing custom scheduling, adding request classification. But the more we dug, the more we realized we were trying to patch a foundation that couldn’t support what we needed to build.
So we made the call to start fresh.
The Constraint That Changed Everything
Here’s where it gets interesting. Leadership wanted this done fast—really fast. My initial estimate was 12-14 months for an MVP with a team of 8-10 engineers. That’s based on my experience shipping similar systems. It accounts for the inevitable discoveries, the integration challenges, the testing cycles.
The response was essentially: “You have four months. Figure it out.”
I’ve been in this situation before. Usually you either negotiate the scope down, or you staff up and accept the coordination overhead. Neither option felt right this time.
About five weeks in, we’d made decent progress but I could see the trajectory wasn’t going to work. We were maybe 15% done with 70% of the timeline already consumed. The math was brutal.
That’s when I started seriously experimenting with using LLMs for more than just code completion.
How AI-Assisted Software Development Made the Impossible Rewrite Possible
I spent a weekend working on a throwaway side project—a small service completely unrelated to our main work. I wanted to understand what was actually possible with AI-assisted development without risking the codebase we’d already built.
What I discovered surprised me. When I designed the project structure specifically around how the models work best—small, focused modules, comprehensive documentation inline with the code, extensive test coverage—the AI wasn’t just helpful. It was transformative.
I came back on Monday with a wild idea: what if we rewrote everything we’d built so far, but structured it from the ground up for AI-assisted development?
My team thought I’d lost it. We’d spent five weeks on that code. But I did a proof of concept over three days, rewriting our core routing logic with AI assistance. The new version had better test coverage, clearer documentation, and handled edge cases the original had missed.
We made the call to start over. Eleven weeks later, we were in production.
Let Me Be Clear: This Is Not Vibe Coding
I need to address something directly because I’ve seen too many people get this wrong.
What we did is the exact opposite of vibe coding. I despise that term. It implies you’re letting the AI drive while you sit back and hope for the best. That’s a recipe for disaster in any production system.
I understand every line of code in our system. I can trace request flows in my head. When we get paged at 2 AM (and we do), I can diagnose issues just as effectively as if I’d typed every character myself.
The AI is a force multiplier for my expertise, not a replacement for it. Think of it like power tools versus hand tools. A table saw lets you cut wood faster than a hand saw, but you still need to know how to measure, what cuts to make, and how to not lose a finger.
If you can’t explain what your code does and why, you have no business shipping it—regardless of whether a human or an AI wrote it.
Understanding Why Traditional Development Is Slow
To appreciate why AI assistance can be so powerful, you need to understand where time actually goes in traditional development. I’ve tracked this obsessively over the years, and the results might surprise you.
Information gathering (20-30% of time): Before you write a line of code, you need to understand what to build and how it fits with existing systems. Documentation is usually outdated or nonexistent. You end up reading code, Slacking colleagues, scheduling meetings. For a medium-complexity feature, this easily burns 2-3 days.
Actual coding (15-25% of time): Here’s the thing nobody talks about—typing code is fast. What’s slow is the constant context-switching. You write for 30 minutes, get pulled into a standup, come back and spend 15 minutes remembering where you were. The actual coding might be 4 hours of work spread across 3 days.
Code review cycles (15-20% of time): Submit a PR, wait for CI, wait for reviewers, address feedback, wait for CI again, get another round of comments. Even with a responsive team, this is 1-2 days minimum. With timezone differences or busy reviewers, it can stretch to a week.
Testing and validation (20-30% of time): Unit tests are just the start. Integration testing, staging deployments, manual verification of edge cases. If your test infrastructure is flaky (and whose isn’t?), add more time for reruns and false-positive investigation.
Deployment and monitoring (10-15% of time): Change management processes, deployment windows, rollout monitoring, rollback procedures if something looks wrong.
Add it up: a feature that’s maybe 200 lines of code change can easily take 3-4 weeks from start to finish. I’ve seen simple bug fixes take longer than that when the stars misalign.
The insight that changed everything for me: AI can compress almost all of these phases, but only if your system is designed to enable it.
The Architecture Decisions That Made It Work
Here’s what we did differently:
Monorepo With Documentation as Code
Everything lives in one repository. Every microservice, every configuration file, every piece of documentation. We even keep our architecture decision records and runbooks in the repo.
Why? Because the AI needs context to be useful. If your documentation lives in Confluence, your configs in a separate repo, and your runbooks in a wiki, the AI is working with one hand tied behind its back.
We also made a rule: no PR gets merged without updating relevant documentation. This sounds obvious but it’s amazing how quickly docs drift from reality. When the AI can read accurate documentation alongside the code, its suggestions are dramatically better.
Test Coverage as a First-Class Concern
We have over 4,000 tests in the repository. Every module, every function with meaningful logic, every edge case we’ve encountered. Our coverage isn’t 100%—that’s a vanity metric—but every critical path is thoroughly exercised.
Here’s why this matters for AI-assisted development: when the AI makes a change, it can immediately validate that change against comprehensive tests. If something breaks, it gets instant feedback and can iterate. Without good tests, you’re flying blind.
We also invested heavily in making tests fast. Our full suite runs in under 3 minutes. If tests take 20 minutes, the feedback loop breaks down and the AI (and the humans) lose the context of what they were trying to do.
Local-First Development
This might be our most important technical decision: the entire system can run on a single laptop.
We built in-memory implementations of every external dependency. Our message queue has a local mode. Our database layer can run against SQLite. Our distributed cache falls back to a simple in-process map.
When the AI is iterating on a change, it can spin up the entire service, run end-to-end tests, and verify behavior—all locally, in seconds. No waiting for cloud resources, no flaky network connections, no shared environments getting polluted by other developers’ experiments.
I cannot overstate how much this accelerates development. Changes that would require a deploy-and-pray cycle in most systems can be verified completely locally in ours.
TypeScript Was the Right Choice (For Us)
We built the system in TypeScript, and while I won’t claim it’s the right choice for everyone, it was crucial for our AI-assisted workflow.
TypeScripts’s compiler catches entire categories of bugs at build time. Memory issues, concurrency problems, type mismatches—all surface as compile errors rather than runtime failures. This is perfect for AI-assisted development because the feedback loop is immediate.
When the AI generates code with a subtle bug, we don’t discover it in production at 3 AM. We discover it in 2 seconds when the compiler yells at us. The AI can then iterate, using the compiler’s (remarkably helpful) error messages as guidance.
I’ve watched the AI fix complex lifetime issues that would take a junior TypeScript/Node.js developer hours to understand. The compiler’s feedback is so structured and specific that the AI can reason about it effectively.
Observability Without Shell Access
We made a deliberate choice: no SSH access to production machines. Zero. Everything is observable through logs and metrics, but nobody can log into a box and poke around.
This seemed restrictive at first, but it forced us to build proper observability. Every important event is logged. Every state transition emits metrics. Every error includes enough context to diagnose remotely.
The unexpected benefit: this architecture works beautifully with AI assistance. I’ve built tools that let the AI query our logging system directly. It can pull logs across multiple services, correlate timestamps, identify patterns—all the tedious investigation work that used to eat hours of my time during incidents.
During my last on-call rotation, I’d estimate the AI handled 70% of the initial investigation for every alert. It would pull relevant logs, identify anomalies, and present a hypothesis before I’d even finished reading the alert description.
Context Management: The Secret to Effective AI-Assisted Software Development
If there’s one skill that separates effective AI-assisted development from frustrated flailing, it’s context management.
The models have large context windows—200K tokens is common now. Anthropic’s Claude and similar models offer expanded context, but larger doesn’t always mean better results. But larger context doesn’t mean better results. In my experience, model performance degrades noticeably past 80-100K tokens. It’s like the cognitive load budget of the model gets spread too thin.
Start fresh constantly. I reset my AI session after almost every completed task. One PR, one session. This feels wasteful at first—aren’t you losing all that context?—but it forces a discipline that pays off. Every task starts with the AI reading exactly what it needs: the relevant module, the relevant docs, the relevant tests. Nothing more.
Be surgical with tool access. Many AI coding setups load dozens of tools—file search, web search, code execution, database access, API calls. Each tool consumes context window space just by existing. We built a minimal custom toolset: file operations, test running, and log queries. That’s it. The AI is more effective with fewer, focused capabilities.
Instructions at point of use. The AI follows instructions better when they arrive right before the relevant task. Instead of front-loading a massive system prompt with every coding standard and convention, we keep minimal persistent instructions and provide specific guidance when needed.
Show, don’t tell. Instead of writing elaborate rules about coding style, we point the AI at exemplary code and say “follow this pattern.” Models are remarkably good at mimicking. They’re less good at interpreting abstract rules.
The Temptation That Will Burn You
Early in the project, we were under pressure to ship a complex feature fast. Streaming responses—should have been straightforward but had tricky edge cases around connection management and backpressure.
Instead of breaking it into small pieces, we gave the AI a high-level description and said “implement this.” It produced something that mostly worked. Tests passed. We shipped it.
Two weeks later, we were debugging production issues that made no sense. Connections hanging. Memory slowly climbing. Occasional data corruption under load.
When I finally traced it down, I found the AI had made a fundamental architectural mistake in how it managed connection state. And buried in a comment was this gem: “In a production system, this would need more robust error handling.”
It knew. It knew it was cutting corners, and it told us, and we shipped it anyway because we were in a hurry.
That was our most important lesson: the AI will attempt whatever you ask, but it’s not always capable of what you ask. When you push beyond its reliable capabilities, it doesn’t refuse—it does its best and hopes you’ll catch the problems.
Now we have a hard rule: no task that can’t be completed and verified in a single session. If something is too complex for that, break it down until each piece is simple enough to succeed reliably.
The Learning Curve Is Real
I track commit frequency across our team. Every single person has had a “hockey stick” moment—a point where their productivity suddenly jumped 3-5x. But nobody had it immediately. The fastest was about three weeks in. The slowest was almost two months.
You can’t shortcut this. You have to build intuition for what the AI can and can’t do reliably. You have to learn how to structure prompts, how to break down tasks, how to verify output efficiently. It’s a skill like any other.
The engineers who struggled longest were, counterintuitively, often the most experienced. They had deeply ingrained workflows that had been successful for years. Adapting those patterns to AI assistance required unlearning as much as learning.
My advice: start with tasks that feel almost insultingly simple. “Add a log statement here.” “Rename this variable.” “Write a test for this function.” Build confidence in the small stuff before attempting anything ambitious.
The Virtuous Cycle
Here’s something I didn’t anticipate: the productivity gains compound.
When you’re shipping faster, you have more time to invest in tooling and automation. Better tooling makes you faster. Faster development means more time for tooling. It spirals upward.
Three months in, we had built custom tools that I would never have had time for in a traditional project. Log analysis automation. Deployment helpers. Testing utilities. Each one made us faster, which freed up time to build more.
We also had time for something that usually gets cut: paying down tech debt as we went. When you’re not constantly behind schedule, you can fix that awkward abstraction before it becomes load-bearing. You can refactor while context is fresh instead of six months later when everyone’s forgotten why things work the way they do.
What Has to Change
For this to work broadly, not just in greenfield projects with hand-picked teams, several things need to change:
Testing culture needs to evolve. Most teams treat tests as a checkbox, not a critical capability. Flaky tests get disabled instead of fixed. Coverage gaps get accepted because “we’ll add tests later.” This won’t work in an AI-assisted world. The AI needs reliable, fast tests to validate its work.
Local development needs investment. Too many systems can only be tested by deploying to shared environments. That was always slow; now it’s a critical bottleneck. Teams need to invest in local-first architectures, even for complex distributed systems.
Documentation needs to be code. If it’s not in the repo, the AI can’t see it. If it’s not version-controlled, it drifts from reality. Wikis and external docs have their place, but the authoritative source for how the system works needs to live with the code.
Process needs to flex. Many organizations have accumulated layers of process—reviews, approvals, change management—that assume human-speed development. When velocity increases 5-10x, these processes become the bottleneck. We need to rethink what controls are actually necessary versus which are just inherited tradition.
This Isn’t the First Revolution
It’s easy to be skeptical of claims about 10x productivity improvements. We’ve heard that before about plenty of technologies that didn’t deliver.
But step back and look at our industry’s history. Assembly to high-level languages. Manual deployment to CI/CD. Monoliths to microservices. Each transition felt impossible to those who lived through it, then became obvious in hindsight.
I remember senior engineers in 2010 insisting that automated testing would never replace manual QA. I remember architects in 2015 arguing that microservices were unnecessary complexity. They weren’t stupid—they were reasoning from their experience, which didn’t include the new paradigm.
We’re in another transition. AI-assisted development isn’t going to replace programmers—I’m more convinced of that than ever after this project. But it is going to change what programming looks like. The teams that figure this out first will move at speeds that seem impossible to those still working the old way.
The Bottom Line
After 83 days from first commit to production, supporting multiple model architectures across several deployment regions, I can’t go back to the old way of working. It would feel like mass-producing the codebase on a typewriter.
The combination that worked for us:
- Monorepo with living documentation
- Comprehensive, fast test suites
- Local-first development architecture
- Strong typing with compile-time checks
- Disciplined context management
- Small, high-confidence tasks
- Humans who understand every line
This isn’t about AI replacing developers. It’s about AI amplifying developers. The human expertise matters more than ever—you need to know what to build, how to structure it, and how to verify it’s correct. But the tedious parts—the boilerplate, the syntax, the test scaffolding, the log analysis—can be radically accelerated.
When your cycle time drops from weeks to hours, you don’t just do the same work faster. You work differently. You experiment more. You fix things that would have festered. You build tools you never would have had time for.
The future belongs to the engineers who figure out how to work with AI effectively—not the ones who fear it, and not the ones who trust it blindly. It belongs to the ones who treat it as the most powerful tool we’ve ever had, and learn to wield it with skill.
I’m still learning. We all are. But I know one thing for certain: I’m never going back.