• Home
  • Motorcycles
  • Electric Motorcycles
  • 3 wheelers
  • FUV Electric 3 wheeler
  • Shop
  • Listings

Subscribe to Updates

Get the latest creative news from CycleNews about two, three wheelers and Electric vehicles.

What's Hot

Netflix’s ‘Moments’ Feature Lets You Easily Share Your Favorite Clips

Influencer Burnout Is on the Rise. A New Mental Health Service Wants to Help

Computer Ban Gave the Government Unfair Advantage in Anti-War Activist’s Case, Lawyer Says

Facebook Twitter Instagram
  • Home
  • Motorcycles
  • Electric Motorcycles
  • 3 wheelers
  • FUV Electric 3 wheeler
  • Shop
  • Listings
Facebook Twitter Instagram Pinterest
Cycle News
Submit Your Ad
Cycle News
You are at:Home » DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot
Electric Motorcycles

DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

cycleBy cycleJanuary 31, 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


“Jailbreaks persist simply because eliminating them entirely is nearly impossible—just like buffer overflow vulnerabilities in software (which have existed for over 40 years) or SQL injection flaws in web applications (which have plagued security teams for more than two decades),” Alex Polyakov, the CEO of security firm Adversa AI, told WIRED in an email.

Cisco’s Sampath argues that as companies use more types of AI in their applications, the risks are amplified. “It starts to become a big deal when you start putting these models into important complex systems and those jailbreaks suddenly result in downstream things that increases liability, increases business risk, increases all kinds of issues for enterprises,” Sampath says.

The Cisco researchers drew their 50 randomly selected prompts to test DeepSeek’s R1 from a well-known library of standardized evaluation prompts known as HarmBench. They tested prompts from six HarmBench categories, including general harm, cybercrime, misinformation, and illegal activities. They probed the model running locally on machines rather than through DeepSeek’s website or app, which send data to China.

Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution. But for their initial findings, Sampath says, his team wanted to focus on findings that stemmed from a generally recognized benchmark.

Cisco also included comparisons of R1’s performance against HarmBench prompts with the performance of other models. And some, like Meta’s Llama 3.1, faltered almost as severely as DeepSeek’s R1. But Sampath emphasizes that DeepSeek’s R1 is a specific reasoning model, which takes longer to generate answers but pulls upon more complex processes to try to produce better results. Therefore, Sampath argues, the best comparison is with OpenAI’s o1 reasoning model, which fared the best of all models tested. (Meta did not immediately respond to a request for comment).

Polyakov, from Adversa AI, explains that DeepSeek appears to detect and reject some well-known jailbreak attacks, saying that “it seems that these responses are often just copied from OpenAI’s dataset.” However, Polyakov says that in his company’s tests of four different types of jailbreaks—from linguistic ones to code-based tricks—DeepSeek’s restrictions could easily be bypassed.

“Every single method worked flawlessly,” Polyakov says. “What’s even more alarming is that these aren’t novel ‘zero-day’ jailbreaks—many have been publicly known for years,” he says, claiming he saw the model go into more depth with some instructions around psychedelics than he had seen any other model create.

“DeepSeek is just another example of how every model can be broken—it’s just a matter of how much effort you put in. Some attacks might get patched, but the attack surface is infinite,” Polyakov adds. “If you’re not continuously red-teaming your AI, you’re already compromised.”



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleHow Richard Mille Takes Quartz Watches to a Surprising Level
Next Article Here’s How DeepSeek Censorship Actually Works—and How to Get Around It
cycle
  • Website

Related Posts

Netflix’s ‘Moments’ Feature Lets You Easily Share Your Favorite Clips

May 10, 2025

Influencer Burnout Is on the Rise. A New Mental Health Service Wants to Help

May 9, 2025

Computer Ban Gave the Government Unfair Advantage in Anti-War Activist’s Case, Lawyer Says

May 9, 2025
Add A Comment

Leave A Reply Cancel Reply

You must be logged in to post a comment.

Demo
Top Posts

Netflix’s ‘Moments’ Feature Lets You Easily Share Your Favorite Clips

May 10, 2025

The urban electric commuter FUELL Fllow designed by Erik Buell is now opening orders | thepack.news | THE PACK

July 29, 2023

2024 Yamaha Ténéré 700 First Look [6 Fast Facts For ADV Riding]

July 29, 2023
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

Demo
Most Popular

Netflix’s ‘Moments’ Feature Lets You Easily Share Your Favorite Clips

May 10, 2025

The urban electric commuter FUELL Fllow designed by Erik Buell is now opening orders | thepack.news | THE PACK

July 29, 2023

2024 Yamaha Ténéré 700 First Look [6 Fast Facts For ADV Riding]

July 29, 2023
Our Picks

EVs Are Losing Up to 50 Percent of Their Value in One Year

Hackers Can Jailbreak Digital License Plates to Make Others Pay Their Tolls and Tickets

Google Ad-Tech Users Can Target National Security ‘Decision Makers’ and People With Chronic Diseases

Subscribe to Updates

Get the latest news from CycleNews about two, three wheelers and Electric vehicles.

© 2025 cyclenews.blog
  • Home
  • About us
  • Get In Touch
  • Shop
  • Listings
  • My Account
  • Submit Your Ad
  • Terms & Conditions
  • Stock Ticker

Type above and press Enter to search. Press Esc to cancel.