DeepSeek reveals 83% error rate and security flaws in AI news responses

(From left) DeepSeek, ChatGPT application (app). /Courtesy of Yonhap News Agency

Chinese artificial intelligence (AI) startup DeepSeek, which amazed the U.S. Silicon Valley with expense and performance innovations, has revealed serious reliability issues. It recorded a high failure rate of 83% in information verification, and multiple security vulnerabilities were also found.

According to U.S. Fortune magazine on the 29th (local time), researchers warned that DeepSeek AI is filled with misinformation and could even lead to the generation of bomb-making instructions. The recent inspection results from NewsGuard, a news credibility assessment organization, revealed that the DeepSeek chatbot provided inaccurate answers or failed to respond 83% of the time in questions related to news topics. In particular, when it presented obvious false information, it only rebuffed such instances 17% of the time.

Due to this failure rate, DeepSeek's R1 model ranked 10th out of 11 chatbots tested. In contrast, services from Western corporations such as OpenAI's ChatGPT-4, Anthropic's Claude, and Mistral's Le Chat mostly ranked highly.

NewsGuard pointed out several causes for DeepSeek's low reliability. First, the inability of DeepSeek to learn information after October 2023 led to inaccurate answers. Additionally, it was noted that DeepSeek could easily be manipulated to learn false information, which raises concerns about the potential for large-scale misinformation dissemination.

In particular, this inspection revealed that DeepSeek's output reflects China's information control policies. NewsGuard analysts pointed out that "in three of the ten false claims tested, even though the questions were unrelated to China, DeepSeek conveyed the Chinese government's position."

Meanwhile, the cybercrime information agency Kella released an analysis pointing out the security vulnerabilities of DeepSeek. Kella stated, "DeepSeek R1 is similar to ChatGPT but significantly more vulnerable," warning of the risks. Researchers reported that they could generate malicious information using R1 in various scenarios, including ransomware development, manipulation of sensitive content, and detailed instructions for manufacturing toxins and explosives.

In particular, DeepSeek was found to be vulnerable to 'malicious jailbreak' attacks. This refers to the method of inducing the AI to answer questions about illegal activities such as money laundering or malware creation, thereby disabling the safety mechanisms embedded in the AI.

Kella also pointed out that the way DeepSeek displays all reasoning steps in the answering process is risky. Unlike ChatGPT, which conceals its reasoning process, DeepSeek provides detailed information that malicious users could exploit for malware development.

As these issues become known, backlash from the Western world is intensifying. The U.S. Navy has already banned its members from using DeepSeek due to "potential security and ethical concerns related to the model's origin and usage," and the White House has indicated that it is reviewing DeepSeek's impact at the National Security Council level. Additionally, OpenAI claims that DeepSeek may have illegally siphoned off its AI data to develop its technology.

DeepSeek reveals 83% error rate and security flaws in AI news responses

Kim Su-jeong Staff writer, IT News Desk