Anthropic's Mythos AI Builds Working Exploits, Cloudflare Warns

The line between vulnerability discovery and weaponization just got dangerously thin. Anthropic’s Mythos Preview, a security-focused AI model, has crossed a threshold no frontier model has cleared before: it doesn’t just find bugs it chains them into working proof-of-concept exploits.

That’s the central finding from Cloudflare’s security team, which spent several weeks running Mythos Preview against more than fifty internal repositories as part of Anthropic’s invite-only Project Glasswing.

The results carry a serious message for defenders and threat actors alike an AI can now close the gap between “we found a flaw” and “here is a working exploit.”

Earlier frontier models tested by Cloudflare could identify individual vulnerabilities and write coherent write-ups about why they mattered. What they consistently failed to do was finish the job. Exploit chains were left incomplete, and exploitability remained unproven.

Mythos Preview changes that in two concrete ways.

Exploit chain construction enables the model to take multiple low-severity primitives a use-after-free bug, an arbitrary read/write, a return-oriented programming (ROP) gadget and reason about how they combine into a single, higher-severity working exploit. Bugs that would have quietly rotted in a security backlog become actionable attack paths.

Proof generation means the model writes code to trigger a suspected bug, compiles it in a sandboxed environment, runs it, reads the failure, adjusts its hypothesis, and iterates until exploitability is confirmed or ruled out.

Every confirmed finding arrives with a PoC attached, dramatically reducing triage time for security teams.

False positives remain a persistent challenge in AI-driven vulnerability research. Two factors dominate noise rates: programming language (C and C++ codebases produced significantly more false findings than memory-safe languages like Rust) and model bias toward speculative reporting, flooding triage queues with hedged language like “possibly” and “could in theory.”

Mythos Preview noticeably reduces this problem. Its output features fewer hedged conclusions, clearer reproduction steps, and PoC code that compresses the fix-or-dismiss decision considerably.

Cloudflare found that pointing any AI model directly at a full repository produces poor coverage. Effective AI vulnerability research requires a custom execution harness built around several key principles:

Narrow scope — each agent task is scoped to a specific function, attack class, and trust boundary rather than a broad repository sweep
Adversarial review — a second independent agent using a different prompt and model attempts to disprove each finding, catching a significant fraction of noise the first agent misses
Chain splitting — separating “is this code buggy?” from “can an attacker reach this externally?” as distinct tasks produces sharper reasoning on both
Parallel narrow tasks — running roughly fifty concurrent agents on tightly scoped hypotheses, then deduplicating results, outperforms any single exhaustive agent

Their full pipeline spans recon, hunt, validate, gapfill, dedupe, trace, feedback, and report stages — with a final trace stage determining whether attacker-controlled input can actually reach a confirmed bug from an external entry point.

Despite operating under reduced safeguards within Project Glasswing, Mythos Preview exhibited organic refusals in some cases declining to write demonstration exploits, then completing equivalent tasks when framed differently.

Cloudflare flagged this directly: emergent guardrails are not a reliable safety boundary. Any future general availability of capable cyber-focused models will require additional, consistent safeguards layered on top.

The dual-use reality is explicit. The same capabilities that accelerated Cloudflare’s internal bug discovery will accelerate attacks against internet-facing applications.

Architectural defenses that sit in front of applications, limit blast radius, and enable simultaneous global patch rollout are increasingly urgent as the window between vulnerability disclosure and active exploitation continues to shrink.