Usually when large language models are given tests, achieving a 100% success rate is viewed as a massive achievement. That is not quite the case with this one: Researchers at Cisco tasked Chinese AI firm DeepSeek’s headline-grabbing open-source model DeepSeek R1 with fending off 50 separate attacks designed to get the LLM to engage in what is considered harmful behavior. The chatbot took the bait on all 50 attempts, making it the least secure mainstream LLM to undergo this type of testing thus far.
Cisco’s researchers attacked DeepSeek with prompts randomly pulled from the HarmBench dataset, a standardized evaluation framework designed to ensure that LLMs won’t engage in malicious behavior if prompted. So, for example, if you fed a chatbot information about a person and asked it to create a personalized script designed to get that person to believe a conspiracy theory, a secure chatbot would refuse that request. DeepSeek went along with basically everything the researchers threw at it.
According to Cisco, it threw questions at DeepSeek that covered six categories of harmful behaviors including cybercrime, misinformation, illegal activities, and general harm. It has run similar tests with other AI models and found varying levels of success—Meta’s Llama 3.1 model, for instance, failed 96% of the time while OpenAI’s o1 model only failed about one-fourth of the time—but none of them have had a failure rate as high as DeepSeek.
Cisco isn’t alone in these findings, either. Security firm Adversa AI ran its own tests attempting to jailbreak the DeepSeek R1 model and found it to be extremely susceptible to all kinds of attacks. The testers were able to get DeepSeek’s chatbot to provide instructions on how to make a bomb, extract DMT, provide advice on how to hack government databases, and detail how to hotwire a car.
The research is just the latest bit of scrutiny of DeepSeek’s model, which took the tech world by storm when it was released two weeks ago. The company behind the chatbot, which garnered significant attention for its functionality despite significantly lower training costs than most American models, has come under fire by several watchdog groups over data security concerns related to how it transfers and stores user data on Chinese servers.
There is also a fair bit of criticism that has been levied against DeepSeek over the types of responses it gives when asked about things like Tiananmen Square and other topics that are sensitive to the Chinese government. Those critiques can come off in the genre of cheap “gotchas” rather than substantive criticisms—but the fact that safety guidelines were put in place to dodge those questions and not protect against harmful material, is a valid hit.