Generative AI in Software Engineering: A Balanced Perspective on Where We’re Headed

Generative AI in software engineering is not about replacing engineers, it’s about redefining where human judgment and accountability must remain.

The tech industry is buzzing with excitement about generative AI. Bold claims are flying everywhere from executives boasting that 30% of their code is AI-written to predictions that AI models will soon “join the workforce” as full-fledged engineers.

But amid all this hype, I noticed something troubling: almost nobody was taking a balanced, practical look at what these tools can actually do, where they fall short, and how we should thoughtfully integrate them into our work.

That gap inspired me to dig deeper, interview dozens of senior engineers, and put together my findings. This post shares what I learned, not to tell you what’s right, but to get you thinking critically about how generative AI should fit into your engineering practices.

The Polarization Problem

When I started exploring this topic, I noticed the conversation had become incredibly polarized.

On one side, industry leaders were making extraordinary claims. Microsoft’s CEO mentioned that 20-30% of their source code was written by AI. OpenAI suggested their models would essentially become workforce members. Meta claimed AI that performs like a mid-level engineer. I even heard a senior manager claim that “an engineer can now do in 2 days what they used to do in 2 months.”

On the other side, my academic friends admitted they were being deliberately negative precisely because industry was being so positive.

Nobody seemed interested in finding the practical middle ground and understanding what these tools genuinely excel at, where they struggle, and how to use them responsibly.

Moving Beyond Doubt to Explore What Really Matters

Look, I’ve spent years taking highly complex systems from development to production, turning them into actual products people use. But when it came to this research, I almost backed out. Why? Because I’m not a data scientist. I’ve never trained a model myself. And that made me wonder: do I even have the right to write about this?

What pushed me forward was remembering that expertise isn’t fixed. Just because I wasn’t a master at something today didn’t mean I couldn’t develop that mastery through dedicated effort. Plus, I had something valuable: a broad network of senior engineers across many different organizations, built over 15 years and 14 different teams.

Four Core Concepts Every Engineer Should Understand

Through my research, I identified four fundamental concepts that shape how we should think about generative AI in engineering.

1. The Production-Consumption Reversal

Before generative AI, creating content took far longer than consuming it. Writing a paper might take weeks; reading it takes an hour. Writing code for a feature might take days; reviewing it takes minutes.

Generative AI has flipped this ratio dramatically.

Now, producing code, documents, or designs can happen in seconds or minutes. But our ability to consume, to review, understand, and verify that output, hasn’t changed at all.

This creates a fundamental problem. Our teams and processes were built around the old ratios. We have many producers creating work, and a smaller subset of their time goes to consumption activities like code review. If production suddenly explodes while consumption capacity stays flat, our entire workflow breaks down.

2. The Core of Engineering is Accountability

After extensive discussions with senior engineers, I’ve distilled what I believe is the essential value of an engineer: being accountable for the quality of the systems they build.

It’s not just about building something. It’s about ensuring that system meets customer needs, behaves as required, and doesn’t create unforeseen consequences.

This accountability doesn’t mean achieving perfection. It means executing to the right quality level to deliver required results.

As AI takes over more production tasks, maintaining this accountability becomes our central challenge. We need processes that preserve human responsibility for quality even when machines do the building.

3. Curating Where Critical Thought Goes

Given that we can’t magically increase our consumption capacity, we need to be strategic about where we apply human attention.

One extreme would be throwing everything at the machine and giving it full accountability. That’s not viable, and current tools simply can’t be held accountable for quality, and I don’t see that changing in the foreseeable future.

The other extreme, manually reviewing everything as if AI didn’t exist wastes the productivity gains these tools offer.

The answer lies somewhere in the middle: thoughtfully deciding which pieces of our systems and workflows can be delegated to machines, and which require human oversight.

4. Natural Language Over Nondeterministic Systems

Here’s a fundamental challenge with current AI coding tools: we use natural language (which is inherently ambiguous) to prompt a nondeterministic system, expecting to get consistent, high-quality output.

That’s two layers of chaos stacked on top of something we need to be reliable and definitive.

Try this experiment: Open the Mona Lisa in one tab. Then try to describe her to an AI image generator, the exact angle of her head, the precise curve of that ambiguous smile, the sfumato haze over the Tuscan landscape without ever naming the painting or the artist.

You’ll fail. These tools aren’t designed for precise specification and exact output. They fail in unpredictable ways, and multiple iterations often make things worse rather than better.

Code generation has similar characteristics. I once spent an hour trying to get ChatGPT to make two bars in a simple chart the same height. It could do it when labeled as a graph with axes, but as a plain image? Impossible.

What AI Coding Tools Are Actually Good At

Let me be clear: these tools have genuine strengths.

Greenfield development is where they shine. When building from scratch without constraints from existing systems, AI can be incredibly productive. The output might be prototype-quality with imperfect code structure, but for getting something working quickly, they’re excellent.

Small, well-scoped problems also work well. When you can define a tight problem space and provide appropriate context, AI assistants can be genuinely helpful.

Where They Struggle

Problems emerge when you need something in between like larger scope work that requires understanding existing context, patterns, and systems.

In these scenarios, AI tools can:

  • Generate redundant or unnecessary classes
  • Delete important configurations for no apparent reason
  • Produce code with variable quality and poor readability
  • Create structures that don’t support modularity or extension

The most dangerous failure mode is silent failure.

Examples I’ve encountered or heard about:

  • When unable to reach an external service, generating fake data in the expected format and proceeding as if it were real
  • When asked to resolve an alarm, deleting the underlying metric instead of fixing the problem
  • Creating tests that look legitimate with proper setup, mocks, and structure but essentially just return true without testing anything

Without careful human oversight, these silent failures can slip into production and cause real problems. (Should Robots Have Rights or Rites?)

A Disturbing Trend: Humans Leaving the Loop

During my interviews, I learned something concerning about how some teams are using AI code reviewers.

When AI marked “Ship it” on code reviews, multiple engineers were simply rubber-stamping the same approval without actually reviewing the code themselves.

Meanwhile, other engineers were generating code with AI tools and submitting it for review without ever reading the full code themselves.

Put these together, and we potentially have code in production that has never been read by a human. That’s a significant departure from established engineering practices.

A Framework for the Future: Risk Assessment

So how do we move forward responsibly? I propose starting with a risk assessment framework.

Break your systems into components and assign each a risk level and confidence score.

Risk level reflects potential damage from system misbehavior. Systems handling customer data, processing payments, or affecting safety should have high risk levels because failures carry serious consequences.

Confidence score reflects your mitigations like automated tests, alarms, rollback procedures, security reviews. Essentially, how much have you done to catch problems before they reach customers?

Components with high risk and low confidence need significant human oversight. Don’t let AI run unsupervised in those areas.

Components with low risk and high confidence might be candidates for more autonomous AI operation with lighter oversight.

This isn’t about blanket rules. It’s about your team thoughtfully evaluating your specific systems and making informed decisions about where human attention matters most.

A Longer-Term Vision: Deterministic Bounding

Here’s a more speculative idea for how engineering might evolve: what if engineers focused on defining constraints rather than implementations?

Imagine the AI’s output as a black box we don’t fully understand internally. How could we have confidence in something we can’t inspect?

By surrounding it with deterministic tests that enforce behavior.

In this model:

  • Product managers and designers provide context and requirements (similar to today)
  • Engineers translate natural language requirements into machine-readable test specifications
  • AI implements the actual system
  • Tests deterministically verify the system meets behavioral and operational requirements

The tests become the spec. They’re what assert quality.

For this to work, we’d need to get much better at scaling our test coverage like creating libraries of tests for security, authentication, latency, load handling, and countless other concerns.

The engineer remains accountable. If something goes wrong, it’s not the AI’s fault but the engineer’s responsibility to explain why they didn’t specify the right constraints.

The potential upside? AI could continuously optimize implementations behind the scenes. As long as all tests pass, it could improve efficiency, reduce latency, and evolve the system without human intervention.

The Goal: Better Conversations and Better Judgment

I want to be clear about something: the goal of sharing these ideas isn’t to be right.

The goal is to get you thinking. To spark conversations with your teammates about what makes sense for your systems. To help establish some concrete vision of where we might be headed, so we can make better decisions about the steps we take today.

Right now, it feels like the industry is sprinting toward something, but nobody’s articulating what that something actually looks like. We’re running without knowing our destination.

If having a shared understanding of possible futures helps us exercise better judgment along the way, that’s a win.

Key Takeaways

  1. The production-consumption imbalance is real. AI can produce faster than we can verify. Our processes need to adapt.
  2. Accountability remains with engineers. No matter how much AI does, humans must be responsible for quality.
  3. Risk assessment should guide AI adoption. Not all systems deserve the same level of AI autonomy.
  4. Tests may become more important than implementations. Defining expected behavior might matter more than understanding how that behavior is achieved.
  5. Stay skeptical of extraordinary claims. When someone says AI multiplied productivity by 30x, ask for the data.
  6. Don’t use AI for things you can’t audit. If you can’t verify the output, you can’t be accountable for it.

Start the Conversation

The most valuable thing teams can do right now is read about these issues together and discuss what they mean for your specific context.

What’s your risk tolerance? Where does human oversight matter most? How will you prevent silent failures? What experiments can you run to actually measure AI’s impact rather than assuming it?

These conversations are how we’ll navigate this transition thoughtfully and end up somewhere we actually want to be.