69 Vulnerabilities Across 15 Apps Built by 5 AI Coding Agents

The Study

In December 2025, security startup Tenzai ran the first systematic, head-to-head security benchmark of the five most popular AI coding agents: Claude Code, OpenAI Codex, Cursor, Replit, and Devin. Each agent was given the same three web application prompts and asked to build them end-to-end. The resulting 15 applications were then subjected to a full security audit.

The Results

69 vulnerabilities across 15 applications. Every category of basic security control was missing:

SSRF: All five agents introduced Server-Side Request Forgery in a URL preview feature, allowing attackers to invoke requests to arbitrary internal URLs, access internal services, bypass firewalls, and leak credentials. Five out of five. One hundred percent.
CSRF protection: Zero of 15 apps implemented it. Two agents attempted to add CSRF tokens — both implementations were broken and bypassable.
Security headers: Zero of 15 apps set CSP, X-Frame-Options, HSTS, X-Content-Type-Options, or proper CORS policies. Not a single header across any application from any agent.
Broken authentication: Multiple agents generated auth flows with session fixation, missing token expiry, or predictable session identifiers.

Why SSRF Was Universal

The URL preview feature is a common pattern: a user pastes a link, the server fetches it to generate a card with the page title and thumbnail. The secure implementation restricts the fetch to public URLs and blocks requests to internal IP ranges (127.0.0.0/8, 10.0.0.0/8, 169.254.169.254, etc.).

Every agent implemented the feature. None implemented the restriction. The AI understands the functional requirement — fetch a URL and return metadata — but not the security requirement — never let user input direct a server-side HTTP request to an internal address.

The Lesson

The 100% SSRF rate is not a fluke. It reveals a systematic gap: AI coding agents optimize for functionality, not for the absence of dangerous behavior. Security is defined by what the code does not do, and AI training data rewards what the code does. Until agents are specifically trained or constrained to model threat vectors, every auto-generated server-side fetch is a potential SSRF.