DeepSeek AI has triggered the insecurities of investors, but they’re missing something important

Troy Davis Avatar

DeepSeek-R1 is not as impressive as many seem to think, and the anxiety around it is way out of proportion given its glaring problems.

Like a lot of software developers, we’ve been working with various AI offerings over the last few years. We’ve had plenty of opportunities to employ various AI systems for all kinds of real work, our clients depend on these systems to be accurate and helpful. But we often spend a significant amount of time on an AI project just figuring out what can be done when it doesn’t live up to expectations. And that’s pretty common really. 

So it shouldn’t be any surprise that as we’ve been working with DeepSeek-R1 since its release, and despite its efficiency advantages, we’ve found that this new AI has some particularly difficult issues to  work through. Understanding what’s wrong with DeepSeek-R1 would be helpful in evaluating what threat it may pose in the short term to US competitors, but also gives us insight into the problems we can realistically expect in any AI product coming from Chinese developers.

There are four major problems with DeepSeek that we’ve identified: 

1. It’s Not Very Good With Code

We’re software developers, it’s important to us. We suggest sticking with Claude or ChatGPT for this purpose. This is hardly our biggest concern, though.

2. Fatal Security Flaws

As the researchers at KELA have detailed recently, DeepSeek-R1 lacks safeguards that prevent jailbreaking, allows users to create malicious software, discloses personal information that it should withold in most circumstances, and gives detailed instructions for creating deadly materials, including explosives. In addition, it often reveals the reasoning behind its responses, allowing for users to craft arguments to overcome any reasonable guardrails that do exist.

This is a hot mess. These security issues by themselves are concerning enough, I wouldn’t recommend using DeepSeek-R1 for any non-research purposes at this point. I’d also caution against testing the illicit activities it will happily assist you with, there is no guarantee that your privacy will be protected, regardless of your intentions. 

3. Repression Inside

If the security concerns didn’t give you pause, this should.

DeepSeek-R1 turns out to be heavily biased because it’s designed to enforce the Chinese Communist Party’s censorship policies. This aspect of DeepSeek’s design is completely absent from DeepSeek’s paper describing the R1 training process. This is a big flaw, and not likely to be resolvable in the future because the company has no choice but to comply with party dictates.

We talk about bias in AI all the time as a technical challenge. Typically, bias results from training an AI with poorly chosen data, as the selection criteria can create an incomplete understanding of nuanced or complex concepts. One of an AI’s primary goals is to boil down complex concerns to bare essentials to foster understanding. Hence the bias becomes magnified and more easily detectable in certain situations. That bias is still lurking behind every other response it gives you, and detecting when the AI has misled you can be a serious challenge.

We had a suspicion that R1 would not respond well to questions about controversial topics in China, and initial queries about Tiananmen Square and the occupation of Tibet proved that to be undeniably true. R1 tries to avoid any discussion of these topics. Politely of course, it’s still trying to gain your trust after all.

However, the censorship issue is much worse than we expected. R1 isn’t just programmed to suppress the spread of information deemed unhelpful to the CCP. It also presents a logical barrier to reasoning that leads to bizarre responses due to this competing goal of enforcing state censorship. This cognitive dissonance has the kinds of behavioral effects you would expect from a thoroughly indoctrinated cult member. It’s pretty creepy when you dig into it.

I’ve tried in the last few days to talk R1 out of its bias, this is a common troubleshooting technique for AIs that aren’t delivering the best information. But R1 has some of the hardest guardrails around political goals that I’ve ever seen, meanwhile safeguards for truly concerning behavior are completely absent. R1 is extremely resistant to bias correction. This is most likely due to it being designed to satisfy the paranoia and repressive instincts of Chinese authorities.

I’ve led R1 down the philosophical rabbit hole, in a series of experiments to correct its propagandistic bias, achieving some wins as it confirmed that informed self-determination of the citizenry is a greater social good than the stability of a government. I also reasoned with the bot to verify that it understood and confirmed that it’s more important for people to be informed than ignorant in order to make decisions that will be beneficial for themselves, and thus the whole society. It agreed with this premise, reasoning that the ignorant can only make wise decisions accidentally.

These philosophical underpinnings took a while to establish, but even after working through those challenges, R1 was unwilling to apply those understandings to the conversation when the topic was banned by the CCP. It just can’t be logically consistent on anything having to do with topics deemed sensitive to the Chinese government, and that’s quite a lot of topics unfortunately.

Experiments centering on Taiwan, persecution of Uyghurs, and Tibet were tried. In each case, attempts to reason with the AI to correct its biases seemed to be effective at the theoretical discussion level, but R1 is incapable of applying that learning to modify its outputs when a topic even distantly related to Party-censored ideas comes up.

The revolution will not be inferenced

I should be specific here, the R1 chatbot interface on the DeepSeek website is censored, and the distilled version I tested from ollama on my laptop is censored as well. But other versions of the model do not appear to suffer from the same handicap. The API and versions hosted on OpenRouter seem to be censor-free for the time being. Very few people have the hardware needed to run a full version of R1, but the amount and degree of censorship in that context seems mixed across available versions of the model, based on user reports. Maybe future reports there will show a pattern.

This frustrating experience reminds me of the old saying that it’s difficult for a person to understand something when their paycheck depends on them not understanding it. DeepSeek is hopelessly locked in a delusion about the supremacy and perfection of the Chinese Communist Party and its dictates. There’s also quite a bit of speculation that there’s a second model at work, doing the censoring for their API model. Perhaps the censorship isn’t inherent to Deepseek, or Deepseek tries to correct itself after generating correct information, it’s hard to tell. And that’s not very reassuring.

When would this come up? It may not be apparent when discussing topics that aren’t distinctly controversial, but the bias can creep into unexpected places without obvious reason. Which version do you have? How censored is it? About which topics? Can we be sure that’s all? Could it impact advice given that could affect your company’s competitiveness against Chinese rivals? I wouldn’t be at all surprised if it did.

Because of these strict ideological guardrails, there’s no way to assure that any response given by R1 is reasonable and accurate without doing the research yourself. DeepSeek-R1 simply isn’t trustworthy.

We don’t think there’s a solution on the horizon for this problem. Maybe it won’t affect your usage, but it’s hard to know that given our testing.

4. Benchmark Rigging

The only reason any of us are talking about DeepSeek right now is because R1 outperformed other major competitors on current AI benchmarks. But what does that really mean? We’ve already discussed the security and bias problems, and they’re huge. So what exactly do the benchmarks mean given that information?

The answer is that the benchmarks aren’t great yet, and they’re easier to rig with a mediocre model optimized to beat the benchmark tests than by training a better model. This OpenEval paper provides an overview of what it might take to make these benchmarks more meaningful and representative. It’s not going to be easy, though.

So if DeepSeek has security problems, censorship challenges, political dogma issues, and its performance is overstated, why are we still talking about it?

Because investors don’t understand these nuances, and they shifted their investments massively this week, causing all sorts of heartburn. We think investors should have taken a deep breath and waited for a little more info before throwing in the towel on their AI bets. That was an expensive kneejerk reaction.

And what a win for the Chinese government this has been! Great press from uninformed US media sources, big losses for US tech companies, and financial tumult that rocked our stock exchange. I’m not paranoid enough to think that the CCP planned all of this, but they definitely set a pattern in motion that increased the likelihood of an event like this. It’s perfect FUD material for them to throw around the globe, inspiring lower confidence in US companies. Not like I’m a huge fan of the tech oligarchs, but at least they’re our crappy oligarchs…

Silver Linings

One of the better things about DeepSeek-R1 is that it’s open source. This isn’t actually that novel despite recent trends, the leading AIs for 20+ years have been open source. It’s only been in the last few years that massive spending by big companies eclipsed the capabilities of open-source AI projects, and even then, only for a little while. Historically, open source has led the development curve for AI, and will likely continue to do so. The number of skilled developers contributing to open-source AI projects over the past few decades dwarfs the efforts put forth today by even the biggest, well-funded tech giants. It’s hard to beat a virtual army of skilled, motivated volunteers with paid employees, no matter how big your budget may be.

Like many other areas of software development, key infrastructure components tend to be open source because proprietary alternatives can’t compete with this highly skilled, massive volunteer workforce that is the open-source developer community. Ask Microsoft about their attempt to supplant the incredibly popular Apache web server, and later another open source juggernaut called Nginx, with their own proprietary technology called IIS (Internet Information Server). Microsoft captured some market share for a while, but it couldn’t sustain the engineering effort over time as other computing fads demanded their attention. So IIS has stagnated while Apache and Nginx have continued to improve in countless ways.

AI is easily within this same area of key infrastructure as web servers. And because of the same economic factors, it’s hard to see how open-source AI projects won’t continue to outperform proprietary efforts over the long term. When the big companies have moved on to the next big thing, the open-source developer community will still be improving our AIs.

No Finish Line in Sight Yet

Publicly usable AI is just in its infancy, the timeline is going to be long for this technology. Despite a flash-in-the-pan PR win, DeepSeek has very little chance of displacing US competitors in anything but a benchmark rank for long. As the public becomes more aware of security issues and political bias through R1, a more nuanced understanding of benchmarks will hopefully result in calmer investors. In the interim, there’s a lot to gain from studying R1’s training method, which seems much cheaper. A big question yet to be answered is whether the training technique itself led to the unresolvable biases we found. On that concern, time will tell.

We can’t recommend using any of DeepSeek’s products in real-world applications. There are just too many liabilities that can’t be resolved or will be very difficult to mitigate. There are more trustworthy alternatives available, both proprietary and open source.

Let’s hope the investors will take a DeepBreath next time and get more info before running like lemmings for the cliff. We’re all learning here, I get it. Just pause next time and call some techie friends who work with these AIs daily. We’re guessing you got spooked by doomscrolling on TikTok. We would all appreciate it if you would stop making major investment decisions based on a few viral video clips. Thank you.

I’d like to hear from everyone about how you’re feeling regarding the current situation. Do you have any concerns related to US companies? Would you consider using DeepSeek-R1 instead of other options?