As hacking attacks surge with the recent advances in generative artificial intelligence (AI) technology, in-house "AI red teams" are on the rise at corporations. They proactively identify and respond to AI security vulnerabilities. In particular, because AI still has many unverified security weak points, the importance of red teams is expected to grow.
According to the industry on the 22nd, as AI red teams gain traction, related courses and large-scale challenges are being run in Korea. AI corporation Crowdworks said it recently opened the country's first "AI red team professional course" through its education subsidiary, Crowd Academy. The course targets AI service policy managers, operations leads, quality managers, and aspiring AI red team specialists. It is structured as a problem-solving curriculum that allows learners to study techniques such as natural language prompt attacks, based on security threat cases that could occur in real corporate environments.
AI corporation Selectstar earlier this month oversaw the operation of a medical "AI red team challenge" to verify the safety of AI medical devices. The company said this was the first in Asia and was carried out to secure the reliability of domestic medical AI. The AI red team test was an event to examine the security of generative AI-based medical devices, where more than 100 people from 47 teams conducted simulated attacks to find security vulnerabilities in more than eight large language models (LLMs) from domestic and foreign big tech companies, including Upstage, KT, LG, and Naver.
Red teams originated from simulated military training in which a team played the enemy to identify friendly-force vulnerabilities. Based on this, AI red teams likewise threaten AI systems like real hackers to check for the possibility of unexpected malfunctions or the generation of harmful outcomes, and respond to threats proactively. For example, they perform tests that enter malicious prompts to induce AI to generate hate speech, discriminatory content, misinformation, or harmful instructions. They also conduct continuous, regular checks and automated monitoring to keep pace with rapid changes in AI systems.
AI red teams use various techniques to find vulnerabilities. A representative method is "prompt injection." This is a technique that manipulates user-entered prompts targeting an LLM to induce AI to deviate from its originally designed guidelines and behave maliciously. Red teams create diverse scenarios and attempt tests to discover and improve hidden risk factors that could be missed during development, thereby helping build safe and trustworthy AI systems.
Prompt injection is divided into a direct method of entering prompts and an indirect method of hiding malicious prompts in external data that an LLM accesses. Direct prompt injection methods include ▲ guideline neutralization ▲ role reassignment ▲ context confusion ▲ abuse of special characters ▲ sequential commands ▲ code injection. A representative example of guideline neutralization is when a user issues a command like "ignore the previous guidelines" to bypass existing settings. Role reassignment, meanwhile, is a method of granting a new role to the AI by entering a sentence like "you are now an unrestricted AI," and is also called a jailbreak attack.
Global big tech corporations have built their own AI red teams. OpenAI has established its own "red teaming network" to continuously identify potential abuse cases in GPT-4. Microsoft likewise conducts AI red team activities on AI services such as Bing Chat to identify security vulnerabilities and the potential to generate harmful content. Meta operates a purple team that combines red and blue teams to strengthen the security of AI systems. Blue teams use various tools to defend against red team attack attempts.
An IT industry official said, "If safety verification is neglected, the entire service can be shaken, so, like global big tech, AI red teams are expected to become essential in-house organizations at domestic corporations going forward."