Code · May 22, 2026

Coding agents are shipping the majority of code at companies like Synthesia. The engineers who are thriving aren't the best coders. They're the best reviewers.

As agentic coding tools become standard infrastructure in 2026, the defining skill gap is no longer about writing code. It's about knowing when not to trust the code that was written for you — and understanding precisely where AI agents consistently fail.

The report from Synthesia was matter-of-fact about what it described. Coding agents are now involved in the majority of code shipped by their engineering team. The volume of code changes has increased. The time humans spend reading those changes has not. That gap — more code entering production, same human review capacity — is the defining technical risk of the current moment in software development, and it is widening at almost every company that has adopted agentic coding tools at scale.

The Pragmatic Engineer's April 2026 survey of over 900 software engineers captures what this transition looks like from inside the engineering team rather than from the product or management layer. The picture is more complicated than the tool vendors' marketing suggests, and more interesting. The productivity gains from AI coding tools are real — and they are extremely unevenly distributed. The engineers extracting the largest gains are not the engineers with the most raw coding talent. They are the engineers who have developed a specific and learnable set of practices around directing and reviewing AI output that most of their colleagues have not yet acquired.

The agentic coding landscape — April/May 2026

Majority

of code shipped at Synthesia now involves autonomous coding agents — the volume of changes increased while review time held flat

~30%

of surveyed engineers hit monthly usage limits on AI coding tools — concentrated among the highest-value use cases

Cursor 3, Claude Code, Copilot

dominant tools in the 2026 agentic coding stack — each with different strengths, failure patterns, and cost profiles

What "agentic coding" actually means in practice — and what it doesn't

The terminology around AI coding tools has become sufficiently vague that it is worth establishing what "agentic coding" actually describes in the 2026 tool landscape, as distinct from what existed eighteen months ago.

The earlier generation of AI coding tools — GitHub Copilot in its original form, early ChatGPT integrations — functioned as sophisticated autocomplete. They predicted the next line or function based on context. The human wrote the scaffolding; the AI filled gaps. Control remained entirely with the developer, and each AI contribution was a discrete suggestion that could be accepted or rejected before moving forward.

The current generation operates differently. Cursor 3, Claude Code, and the agentic modes of GitHub Copilot can receive a task description — "refactor this authentication module to use the new token validation library" or "write end-to-end tests for the payment flow" — and produce a sequence of code changes that spans multiple files, involves reasoning about dependencies and side effects, and results in a diff that may touch dozens of locations across the codebase. The human's role has shifted from line-by-line author to task director and output reviewer. The agent does the writing. The human decides whether what it wrote is correct.

This is a fundamentally different skill than software development as it was practiced five years ago, and it is worth being precise about what that means for career development and team composition. The ability to write elegant, efficient code from scratch remains valuable — but it is no longer the primary bottleneck in most engineering workflows that have adopted agentic tools. The primary bottleneck is the ability to specify tasks with sufficient precision that agents produce useful output, and to review that output with sufficient rigor to catch the categories of error that agents consistently make.

"Developers are increasingly shifting from 'writing every line of code' to 'reviewing and directing autonomous agents.' The tools have become standard equipment. The practices for using them safely have not."
— Pragmatic Engineer survey analysis, April 2026

The six places AI coding agents consistently fail — and why they matter

The engineers getting the best results from agentic coding tools have, through experience and deliberate attention, developed an understanding of the categories of failure that agents produce reliably. These are not random errors — they are systematic patterns in how large language models handle certain types of programming problems. Understanding them allows an engineer to direct agents more effectively and review output more efficiently.

Error handling and edge cases. AI coding agents are trained on vast amounts of code — including vast amounts of code that handles the happy path correctly and handles edge cases poorly. The generation bias toward common patterns means that agents tend to produce code that works correctly for the inputs the developer explicitly specified and fails silently or incorrectly for inputs they didn't think to mention. Robust error handling — boundary checks, null safety, graceful degradation, meaningful error messages — requires being explicit in the task specification rather than assuming the agent will infer appropriate robustness from context.

Security context. Agents generate code that solves the stated problem. They do not automatically reason about the security implications of the solution in the context of the broader system. A generated authentication function may be cryptographically correct in isolation while introducing a timing attack. A generated SQL query may be correct for the happy path while being injectable when the input validation assumptions are not met. Security review of AI-generated code requires bringing security context that the agent does not have — knowledge of how the function will be called, what inputs it will receive, and what the consequences of a failure are.

Dependency choices. When an agent needs to solve a problem that could be handled by a library, it will select a library from its training distribution. The library it selects may be outdated, may have known vulnerabilities, may have been deprecated, or may have transitive dependencies that conflict with the existing stack. The Grafana breach — which entered through a compromised npm dependency — is a direct illustration of why dependency choices in AI-generated code require explicit audit rather than passive acceptance.

Concurrency and state. Race conditions, deadlocks, and state management bugs in concurrent systems are among the most difficult bugs to detect through automated testing and the most systematically underrepresented in AI training data relative to their real-world frequency. Agents generate concurrent code that looks correct — correct locking, correct async patterns — and is subtly wrong in ways that only manifest under specific timing conditions that unit tests don't exercise.

Architecture coherence over time. This is the failure mode that is least visible in individual code reviews and most consequential over months. Agents have no memory of what they built in previous sessions. Each task is a fresh start. Over time, an agentic codebase tends to accumulate inconsistencies — different naming conventions in different modules, different error handling patterns in different parts of the stack, different approaches to the same problem in different files — because each agent interaction independently produced locally reasonable code without awareness of the broader patterns established in prior interactions.

Test quality. AI agents generate tests readily and confidently. The tests are often structurally correct and provide passing coverage numbers. They are also frequently testing the implementation rather than the behavior — tests that assert specific implementation details rather than the contract that the implementation is supposed to fulfill. These tests pass when they should, fail when the implementation is refactored even when the behavior is preserved, and miss the edge cases that genuine behavior-driven tests would catch. Test coverage numbers from AI-generated tests are a less reliable signal of actual code quality than they appear.

The AI code review checklist that high-performing engineers use

Does the error handling cover the realistic failure modes, not just the stated happy path?
Are all inputs to this function validated before use? What happens if they're null, empty, or malformed?
Are the selected dependencies pinned to a specific version? Have they been checked against CVE databases?
If this code runs concurrently, have I reasoned about the race conditions explicitly?
Does this follow the naming and architectural patterns established in the rest of the codebase?
Are the tests testing behavior or implementation? Would they still pass after a valid refactor?
Are there any security implications specific to how this function will be called in production?

The new role that doesn't have a job title yet

Inside engineering organizations that have adopted agentic coding tools at scale, a function is emerging that is not yet consistently named or organizationally recognized. Informally, it is the role of the person who reviews and is accountable for the quality of AI-generated code before it reaches production. Some organizations are calling this an "AI code reviewer." Others are distributing the function across senior engineers as an extension of existing code review practices. A few are hiring specifically for it.

The skill set required for this function is distinct from the skill set of the best traditional software engineers, though they overlap significantly. A strong AI code reviewer needs deep architectural understanding of the system they are reviewing — deep enough to recognize when agent-generated code is locally correct but architecturally inconsistent. They need security intuition sufficient to identify the categories of vulnerability that agents introduce systematically. They need enough knowledge of the testing landscape to evaluate test quality rather than test coverage. And they need the judgment to assess the cumulative architectural drift of a codebase that has been partially authored by agents over an extended period.

This is senior-engineer work. It is not a junior role, and it is not a role that can be filled by someone whose primary qualification is familiarity with the AI tools. The demand for engineers with this specific capability — who understand both the underlying systems and the systematic failure modes of AI-generated code — is currently growing faster than the supply. The compensation premium for this skill set will become visible in the hiring data over the next twelve to eighteen months.

CVE-2026-31431 and what AI-generated code has to do with it

The "Copy Fail" vulnerability — CVE-2026-31431 — is a local privilege escalation in the Linux kernel's algif_aead module, affecting virtually every major Linux distribution running kernels from 2017 onward. A 732-byte Python proof-of-concept exploit is now publicly available. CISA designated it a Known Exploited Vulnerability with a May 15 remediation deadline for federal systems.

The connection to AI-generated code is indirect but worth noting. Security researchers at Hornetsecurity's threat intelligence lab have observed that AI coding agents, when generating code that interfaces with kernel-level features or cryptographic primitives, do not consistently apply the security context that would lead a human security-conscious developer to validate assumptions about the underlying API's behavior. The specific pattern in Copy Fail — involving the kernel's AEAD cryptographic mode handling — represents exactly the category of low-level assumption about underlying system behavior that AI agents handle poorly without explicit security context in the task specification.

This is not a claim that AI-generated code caused the vulnerability. It is an observation that the category of system-interface assumption that underlies this vulnerability is one of the categories where AI code review is most important — and where the engineers who have developed the specific skill of reviewing AI-generated security-critical code are most differentiated from those who have not.

What engineering teams should do in the next 90 days

The 2026 agentic coding landscape is not going to become less complex. The tools will improve. The agents will become more capable. The volume of AI-generated code reaching production will increase. Engineering organizations that develop the practices, culture, and personnel capable of reviewing that code effectively will produce better software with the tools than organizations that deploy the tools without addressing the review gap.

The practical steps are not complicated. They require commitment and consistency rather than technical sophistication. Build explicit code review criteria for AI-generated code that go beyond style and syntax — the criteria need to address the six failure categories identified above as a checklist rather than leaving them to individual reviewer intuition. Identify the senior engineers in the organization who have developed strong AI code review instincts and formalize their role as reviewers and trainers. Track defect rates for AI-generated versus human-written code over time — the data will clarify where the review process is working and where it is not. And invest in the training that develops the judgment to direct agents effectively, rather than assuming the judgment follows automatically from tool exposure.

The transition in software development is real, accelerating, and will not reverse. The question for engineering leaders is whether the organizations they run will be positioned in two years as having developed the capability to use agentic tools with rigor — or as having accumulated a technical debt of poorly-reviewed AI-generated code that will take years to work through.

Sources: The Pragmatic Engineer, "Impact of AI on Software Engineers 2026," April 14 2026 · Synthesia engineering blog, 2026 · DataNorth AI Q2 2026 Tool Update · Hornetsecurity Monthly Threat Report, May 2026 · eSecurity Planet, May 2026 Weekly Roundup · CISA Known Exploited Vulnerabilities Catalog, CVE-2026-31431

Coding agents are shipping the majority of code at companies like Synthesia. The engineers who are thriving aren't the best coders. They're the best reviewers.

What "agentic coding" actually means in practice — and what it doesn't

The six places AI coding agents consistently fail — and why they matter

The new role that doesn't have a job title yet

CVE-2026-31431 and what AI-generated code has to do with it

What engineering teams should do in the next 90 days

More Stories

The $14,000 AI Subscription Myth (And Real Math)

$5,700 a Day, While You Sleep

Money20/20 Europe 2026: Who Owns the Rails?