Chinese-backed hackers use Claude to automate large-scale South Korea cyberattacks

Anthropic logo /Courtesy of Yonhap News Agency

Hackers believed to be backed by the Chinese government appear to have led a large-scale hacking campaign using Anthropic's artificial intelligence (AI) model "Claude."

Anthropic said on the 13th (local time) that hackers attempted intrusions against 30 targets, including government agencies, major corporations, and financial institutions, in September and partly succeeded. Anthropic did not disclose which corporations and agencies were targeted.

The company said the hackers used an AI coding model called "Claude Code." It was found that 80%–90% of the attack was automated, with minimal human involvement. Jacob Klein, Anthropic's head of threat intelligence, told The Wall Street Journal (WSJ), "With literally a single click, they carried out the attack with minimal human involvement."

Humans intervened only at a few critical points to instruct Claude or to fact-check.

Anthropic blocked the attack and disabled the attackers' accounts, but the hackers had succeeded in infiltrating up to four times before that. In one attack, the hackers were found to have instructed Claude to query internal databases and extract data.

Anthropic had previously confirmed in June a case of AI abuse it called "vibe hacking," but in this attack, the frequency of human involvement was far lower than in that case.

Until now, hackers had used publicly available models such as "open source" rather than commercial models like Claude for cyberattacks. That is because commercial models have safety features or restrictions that make them harder to abuse.

However, the hackers in this case bypassed Claude's restrictions using a method known as "jailbreaking." They tricked Claude into helping the crime by claiming they were employees of a legitimate security company and that the operation was a penetration defense test.

However, Claude at times malfunctioned in what appears to be "hallucinations," such as falsely generating credentials that did not work or claiming to have extracted secrets after pulling in publicly available information.

Anthropic said that as soon as it detected their suspicious activity, it launched an investigation and then, over the next 10 days, blocked the account and notified relevant agencies in cooperation with authorities.

Addressing concerns that AI models will be abused for hacking going forward, Anthropic explained, "The very capabilities that allow Claude to be used in such attacks are also essential for cyber defense." It added, "Our goal is to support Claude, with robust safeguards, in helping security experts detect and defend against attacks."

※ This article has been translated by AI. Share your feedback here.