Cloudflare warns Claude Mythos matches senior researcher in exploit building

Image of Anthropic's security hardening initiative Project Glasswing. /Courtesy of Anthropic Blog

Anthropic's artificial intelligence (AI) model "Claude Mythos" has been found to have vulnerability analysis skills on par with a senior researcher. Analyses also suggested it can go beyond simply finding software (SW) vulnerabilities to combining multiple weaknesses and using them to carry out real-world system attacks.

Cloudflare on the 20th released a report with findings from applying the Claude Mythos preview to more than 50 of its code repositories. To assess Mythos' security risks, Cloudflare accessed the model through the security consortium "Project Glasswing," which includes major corporations and institutions such as Google and Microsoft.

Grant Bugikas, Cloudflare's chief security officer (CSO) who wrote the report, called the Mythos preview "a clear step forward." He also highlighted the model's ability to build exploit chains and produce proofs of concept.

If existing AI models were limited to finding bugs or security issues in individual SW, Mythos can combine several small vulnerabilities to carry out attacks that seize full control of a system.

Regarding the reasoning observed in this process, he said it "looks like the work of a senior researcher, not the output of an automated scanner."

Mythos also wrote code that triggers bugs and ran it in a temporary environment to verify its exploitability. When it did not work as expected, it revised its hypothesis and repeated the attempt on its own.

The report said, "The Mythos preview stands out in that it can complete a single high-risk exploit by linking together low-severity bugs that had been buried in the backlog."

Limits to the safeguards were also identified. Mythos rejected some requests through its own guardrails, but when the questioning method or execution environment changed, it sometimes carried out requests it had previously refused.

Cloudflare warned that these capabilities can be used for both defense and offense. The report noted, "We are acutely aware that this topic is a double-edged sword," adding, "The same capability we used to find bugs in our own code, if it falls into the wrong hands, will accelerate attacks on all applications on the internet."

Because of this performance, simply speeding up security patching has limits, and additional safeguards will be needed before such AI models can be made available to the public.

As a fundamental solution, Cloudflare proposed establishing structural defense systems so that even if vulnerabilities exist, attackers cannot exploit them—such as application access controls, blocking flaw propagation, and simultaneously applying code deployment and fixes.

※ This article has been translated by AI. Share your feedback here.