The Third Golden Age of Software Engineering: Field Notes from AI Dev SF 26

Summary

AI Dev 26 in San Francisco emphasized that software engineering is being rebuilt around agents, with code becoming a commodity and the bottleneck shifting from writing code to deciding what to build.
The role of engineers is shifting from assisting to driving AI, centered on three skills: specification, review, and orchestration, with reading code you did not write becoming a central competency.
Quality and trust gaps are widening: pull request volume and bug rates are up 30%, only 29% of engineers trust AI output accuracy, and fewer than 30% of enterprise agents reach production, making harness engineering, evals, and spec-driven development critical.
Hiring is moving toward product engineers who can go from vague problem to prototype in under a day, and continuous daily learning (around 30 minutes a day) is replacing quarterly training initiatives.

I spent April 28 and 29 in San Francisco at AI Dev 26, hosted by Andrew Ng and DeepLearning.AI. Thousands of developers, two days, three stages, and one consistent message that came up in almost every session: software engineering is being rebuilt around agents, and the firms still arguing about whether AI helps or hurts productivity are asking the wrong question.

Below are my takeaways: what the industry is getting right, where I see the gap, and how we are operating around it at Cheesecake Labs.

Code became a commodity. Most teams have not caught up.

The opening day kept returning to one theme. Anush Elangovan from AMD, Marc Brooker from AWS, the panel with Replit, LandingAI, Oracle, and Practical Data Media. The framing was different in every session, but the substance was the same. Writing code is no longer where the value sits.

Andrew Ng put it most plainly in his keynote. The bottleneck is no longer code. It is deciding what to build. Once you accept that, the rest of the org chart starts to wobble. If a generalist with a strong product instinct can ship a working prototype in a day, what is the role of a six person feature team?

If an engineer can supervise three or four agents at once, what is the role of a manager whose job was to coordinate three or four engineers?

The honest answer most leaders avoid: those roles do not disappear, but they collapse into each other. Product people will need to ship code. Engineers will need to talk to customers. Managers, as the panel put it, will all become managers of agents.

75% of code shipped at Google is now built with AI. That number gets quoted often. The more interesting number, which came up later in JetBrains’ session, is that 95% of companies are seeing zero return on their AI investment. Both are true at the same time. The difference is not the model. It is the process around the model.

From assistant to driver

Brandon Middleton’s Vibe Coding Master Class for Replit was the cleanest articulation of the mindset shift. He framed the new work in three pieces.

Specification: knowing what to ask.
Review: reading code you did not write.
Orchestration: stitching together agents, tools, integrations, and data.

The Karpathy line he quoted is going to stick: the hottest programming language is English.

This matches what we have been pushing at Cheesecake Labs for months. The shift is not from “I write code without AI” to “I write code with AI.” That is a productivity tweak. The actual shift is from assisting to driving. AI does not help you with a task. AI delivers the task. You supervise it.

Reading code you did not write and deciding whether to ship it has become the central skill of the role. That single sentence reframes hiring, training, code review, and team composition.

Agentic engineering, not vibe coding

If there was one talk that crystallized the strategic picture for me, it was Paul Everitt’s session for JetBrains: Code vs. Staff vs. Quality: The Shift to Agentic Engineering. He named what most of us have been circling.

Vibe coding got us excited. It will not get us to production. What will is a stack of practices that already has a name in the more serious corners of the industry.

Spec driven development. Evals at every layer. Harness engineering. Context engineering. Modular, well documented codebases. TDD with red and green cycles still applied, but now applied to specs and to AI generated implementations. QA agents with browser and dev tools wired in. Observability that goes deeper than logs.

The slogan from the OpenAI harness engineering framework that came up in his session is the cleanest summary I have heard for the new work: engineers no longer build the thing, they build the thing that builds the thing. If you hit a problem, fix the cause, not the symptom. Make the human not be the bottleneck.

That last one is the hard part. Most engineering organizations are still structured to put humans in the bottleneck. Code review, sign off, environment approval, testing, deploy. Every one of those steps is a human gate.

In an agent native delivery model, every one of those gates becomes either an agent or a policy that an agent enforces, and the human moves up the stack to architecture, judgment, and trade off decisions.

The quality and trust gap

The most underdiscussed numbers from the conference were on the quality side.

Fifty percent more issues being deployed in many codebases. Prompt injection now appears as a real production category. Twenty nine percent of engineers trust the accuracy of AI output. Less than thirty percent of enterprise agents reach production at all, a number Diamond Bishop from Datadog cited in The Next 100 Agents: Building the Agent-Native Office.

CodeRabbit’s session, Deploying AI Code Review at Scale: Turning AI Velocity into a Reliable Quality Gate, was direct about this. Pull request volume is up thirty percent. So is the bug rate. Code review depends on context, and the volume of code being generated has outpaced the context most reviewers carry.

Tom Howlett at Sonar made a sharper point in Can LLMs Generate Enterprise Quality Code?. The answer is conditional. They can, if you wrap them with a guide, verify, and solve the loop. 90% of issues are caught at the earliest moment if you instruct the agent to analyze its own output before a comprehensive CI check. That sequence matters. Analyze, then fix, then verify in CI is faster and cleaner than the lint and review pattern most teams still default to.

The pattern across all these sessions is the same. AI raises the ceiling and the floor at the same time. Without rigor, the floor drops faster than the ceiling rises. Rigor is a process problem, not a model problem.

The third golden age

Paul Everitt referenced a piece by Pragmatic Engineer that called this the third golden age of software engineering. I think that framing is correct, and I think most engineering leaders are underestimating it.

The first golden age was the rise of the personal computer and the move from mainframes to distributed software. The second was the cloud and the move from boxed product to continuous delivery. This third one is moving from humans writing code to humans designing and supervising systems that write code.

What gets emphasized in a third golden age framing, and what gets dismissed in the louder vibe coding framing, is system design. Architectural literacy. Knowing why things are organized the way they are, where the seams are, and how the pieces should compose. Simon Willison is reportedly writing a book on this. Good. We need it.

Spec driven development, context engineering, agentic validation, compounding engineering. Those are the words that will matter. They are also the words that will separate teams that ship working systems from teams that ship demos.

Key takeaways of AI Dev SF 26

First, AI is no longer treated as an assistant. It is the primary execution layer. The human role is to specify, review, and orchestrate. That posture is non-negotiable internally. If someone is still using AI as autocomplete, we treat it as a training gap and close it.

Second, the harness matters more than the model. Skills, context files, project level wikis inside repositories, reusable prompts, well structured spec templates, evals, QA agents. None of that is glamorous. All of it compounds. Teams that invest in the harness pull ahead and stay ahead. Teams that chase the next model release stay flat.

Third, hiring is changing. The profile we look for is closer to a product engineer than a traditional full stack developer. Someone who can move from vague problem to working prototype in less than a day. Someone who can read code they did not write and decide whether to ship it. Someone who can translate a non technical problem into a system specification. Someone who can ship end to end and debug across the seams.

Fourth, learning is now a daily practice, not a quarterly initiative. Thirty minutes a day, every day, was the figure mentioned more than once on stage. I think that is right. The pace of change does not allow for batched learning anymore.

Closing thought about AI Dev SF this year

The honest summary of two days in San Francisco on AI Dev SF 26 is this. The technology is ready. The organizational redesign is not. Most engineering leaders still talk about AI as a tool. The leaders who will look prescient three years from now are the ones already treating it as a redesign of the role.

Augment and innovate, not automate and replace, was one of the closing lines from Paul Everitt’s talk. I would extend it. Augment and innovate, but accept that augmentation, done seriously, is itself a redesign. The job description of an engineer in 2026 is not the job description of an engineer in 2024 with AI tools attached. It is something different, and the firms that name that difference and rebuild around it will be the ones that ship.

This is the work we do at Cheesecake Labs every day. We partner with companies that have outgrown the prototype phase and need to ship AI into production with the rigor it actually requires. Specs, evals, harnesses, agent native delivery, the boring infrastructure that separates demos from durable products. If your team is staring at that gap, get in touch.

FAQ

What was AI Dev 26 and when did it take place?

AI Dev 26 was a conference held on April 28 and 29 in San Francisco, hosted by Andrew Ng and DeepLearning.AI. It featured thousands of developers across two days and three stages.

What is the central mindset shift in software engineering highlighted at the conference?

The shift is from using AI as an assistant to using it as the primary execution layer. As Brandon Middleton framed it in Replit's Vibe Coding Master Class, the new work consists of three pieces: specification (knowing what to ask), review (reading code you did not write), and orchestration (stitching together agents, tools, integrations, and data).

What quality and trust issues were raised about AI generated code?

Sessions cited 50% more issues being deployed in many codebases, prompt injection appearing as a real production category, only 29% of engineers trusting the accuracy of AI output, and less than 30% of enterprise agents reaching production. Pull request volume is up 30%, and so is the bug rate. Tom Howlett at Sonar noted that 90% of issues are caught early if the agent analyzes its own output before a comprehensive CI check.

What is agentic engineering and how does it differ from vibe coding?

In Paul Everitt's JetBrains session, agentic engineering was described as the practices needed to reach production, including spec driven development, evals at every layer, harness engineering, context engineering, modular and well documented codebases, TDD applied to specs and AI generated implementations, QA agents with browser and dev tools, and deeper observability. Vibe coding generates excitement but does not reach production.

What are the key takeaways for engineering teams from AI Dev SF 26?

First, AI is the primary execution layer, with humans specifying, reviewing, and orchestrating. Second, the harness matters more than the model, including skills, context files, project level wikis, reusable prompts, spec templates, evals, and QA agents. Third, hiring is shifting toward product engineers who can move from vague problem to working prototype in under a day. Fourth, learning is a daily practice, with 30 minutes a day mentioned as a benchmark.

About the author.

Douglas da Silva

Douglas started as a Senior FullStack Developer at Cheesecake Labs and currently he's Partner and CBDO at the company.

The Third Golden Age of Software Engineering: Field Notes from AI Dev SF 26

Code became a commodity. Most teams have not caught up.

From assistant to driver

Agentic engineering, not vibe coding

The quality and trust gap

The third golden age

Key takeaways of AI Dev SF 26

Closing thought about AI Dev SF this year

FAQ

About the author.

See also.

State of Mobile 2026: AI to Mobile Dev is Already Here

Why Modernize Legacy Applications? 8 Critical Business Benefits

Cross-Platform Migration: Why It Works