Chinese Open-Source AI DeepSeek R1 Matches OpenAI's o1 at 98% Lower Cost

DeepSeek's new R1 model matches or beats OpenAI's performance while being free and open-source—and it got there in a fascinating way.

Jan 25, 2025

9 min read

Image created by Decrypt using AI

Chinese AI researchers have achieved what many thought was light years away: A free, open-source AI model that can match or exceed the performance of OpenAI's most advanced reasoning systems. What makes this even more remarkable was how they did it: by letting the AI teach itself through trial and error, similar to how humans learn.

“DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.” the research paper reads.

“Reinforcement learning” is a method in which a model is rewarded for making good decisions and punished for making bad ones, without knowing which one is which. After a series of decisions, it learns to follow a path that was reinforced by those results.

Initially, during the supervised fine-tuning phase, a group of humans tells the model the desired output they want, giving it context to know what’s good and what isn’t. This leads to the next phase, Reinforcement Learning, in which a model provides different outputs and humans rank the best ones. The process is repeated over and over until the model knows how to consistently provide satisfactory results.

DeepSeek R1 is a steer in AI development because humans have a minimum part in the training. Unlike other models that are trained on vast amounts of supervised data, DeepSeek R1 learns primarily through mechanical reinforcement learning—essentially figuring things out by experimenting and getting feedback on what works.

"Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and interesting reasoning behaviors," the researchers said in their paper. The model even developed sophisticated capabilities like self-verification and reflection without being explicitly programmed to do so.

As the model went through its training process, it naturally learned to allocate more "thinking time" to complex problems and developed the ability to catch its own mistakes. The researchers highlighted an "a-ha moment" where the model learned to reevaluate its initial approaches to problems—something it wasn't explicitly programmed to do.

The performance numbers are impressive. On the AIME 2024 mathematics benchmark, DeepSeek R1 achieved a 79.8% success rate, surpassing OpenAI's o1 reasoning model. On standardized coding tests, it demonstrated "expert level" performance, achieving a 2,029 Elo rating on Codeforces and outperforming 96.3% of human competitors.

But what really sets DeepSeek R1 apart is its cost—or lack thereof. The model runs queries at just $0.14 per million tokens compared to OpenAI's $7.50, making it 98% cheaper. And unlike proprietary models, DeepSeek R1's code and training methods are completely open source under the MIT license, meaning anyone can grab the model, use it and modify it without restrictions.

AI leaders react

The release of DeepSeek R1 has triggered an avalanche of responses from AI industry leaders, with many highlighting the significance of a fully open-source model matching proprietary leaders in reasoning capabilities.

Nvidia's top researcher Dr. Jim Fan delivered perhaps the most pointed commentary, drawing a direct parallel to OpenAI's original mission. "We are living in a timeline where a non-U.S. company is keeping the original mission of OpenAI alive—truly open frontier research that empowers all," Fan noted, praising DeepSeek's unprecedented transparency.

We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.

DeepSeek-R1 not only open-sources a barrage of models but… pic.twitter.com/M7eZnEmCOY

— Jim Fan (@DrJimFan) January 20, 2025

Fan called out the significance of DeepSeek's reinforcement learning approach: "They are perhaps the first [open source software] project that shows major sustained growth of [a reinforcement learning] flywheel. He also lauded DeepSeek's straightforward sharing of "raw algorithms and matplotlib learning curves" versus the hype-driven announcements more common in the industry.

Apple researcher Awni Hannun mentioned that people can run a quantized version of the model locally on their Macs.

DeepSeek R1 671B running on 2 M2 Ultras faster than reading speed.

Getting close to open-source O1, at home, on consumer hardware.

With mlx.distributed and mlx-lm, 3-bit quantization (~4 bpw) pic.twitter.com/RnkYxwZG3c

— Awni Hannun (@awnihannun) January 20, 2025

Traditionally, Apple devices have been weak at AI due to their lack of compatibility with Nvidia’s CUDA software, but that appears to be changing. For example, AI researcher Alex Cheema was capable of running the full model after harnessing the power of 8 Apple Mac Mini units running together—which is still cheaper than the servers required to run the most powerful AI models currently available.

That said, users can run lighter versions of DeepSeek R1 on their Macs with good levels of accuracy and efficiency.

I will run AGI at home or die trying.

DeepSeek R1 should run fast on these macs. They have a total of 896GB unified memory @ 3557GB/s https://t.co/NtdDIioSUn pic.twitter.com/pEW0wp7avy

— Alex Cheema - e/acc (@alexocheema) January 20, 2025

However, the most interesting reactions came after pondering how close the open source industry is to the proprietary models, and the potential impact this development may have for OpenAI as the leader in the field of reasoning AI models.

Stability AI's founder Emad Mostaque took a provocative stance, suggesting the release puts pressure on better-funded competitors: "Can you imagine being a frontier lab that's raised like a billion dollars and now you can't release your latest model because it can't beat DeepSeek?"

Can you imagine being a "frontier" lab that's raised like a billion dollars and now you can't release your latest model because it can't beat deepseek? 🐳

Sota can be a bitch if thats your target

— Emad (@EMostaque) January 20, 2025

Following the same reasoning but with a more serious argumentation, tech entrepreneur Arnaud Bertrand explained that the emergence of a competitive open source model may be potentially harmful to OpenAI, since that makes its models less attractive to power users who might otherwise be willing to spend a lot of money per task.

“It's essentially as if someone had released a mobile on par with the iPhone, but was selling it for $30 instead of $1000. It's this dramatic.”

Most people probably don't realize how bad news China's Deepseek is for OpenAI.

They've come up with a model that matches and even exceeds OpenAI's latest model o1 on various benchmarks, and they're charging just 3% of the price.

It's essentially as if someone had released a… pic.twitter.com/aGSS5woawF

— Arnaud Bertrand (@RnaudBertrand) January 21, 2025

Perplexity AI's CEO Arvind Srinivas framed the release in terms of its market impact: "DeepSeek has largely replicated o1 mini and has open-sourced it." In a follow-up observation, he noted the rapid pace of progress: "It's kind of wild to see reasoning get commoditized this fast."

It's kinda wild to see reasoning get commoditized this fast. We should fully expect an o3 level model that's open-sourced by the end of the year, probably even mid-year. pic.twitter.com/oyIXkS4uDM

— Aravind Srinivas (@AravSrinivas) January 20, 2025

Srinivas said his team will work to bring DeepSeek R1’s reasoning capabilities to Perplexity Pro in the future.

Quick hands-on

We did a few quick tests to compare the model against OpenAI o1, starting with a well-known question for these kinds of benchmarks: “How many Rs are in the word Strawberry?”

Typically, models struggle to provide the correct answer because they don’t work with words—they work with tokens, digital representations of concepts.

GPT-4o failed, OpenAI o1 succeeded—and so did DeepSeek R1.

However, o1 was very concise in the reasoning process, whereas DeepSeek applied a heavy reasoning output. Interestingly enough, DeepSeek’s answer felt more human. During the reasoning process, the model appeared to talk to itself, using slang and words that are uncommon on machines but more widely used by humans.

For example, while reflecting on the number of Rs, the model said to itself, “Okay, let me figure (this) out.” It also used “Hmmm,” while debating, and even said things like “Wait, no. Wait, let’s break it down.”

The model eventually reached the correct results, but spent a lot of time reasoning and spitting tokens. Under typical pricing conditions, this would be a disadvantage; but given the current state of things, it can output way more tokens than OpenAI o1 and still be competitive.

Another test to see how good the models were at reasoning was to play “spies” and identify the perpetrators in a short story. We choose a sample from the BIG-bench dataset on Github. (The full story is available here and involves a school trip to a remote, snowy location, where students and teachers face a series of strange disappearances and the model must find out who was the stalker.)

Both models thought about it for over one minute. However, ChatGPT crashed before solving the mystery:

But DeepSeek gave the correct answer after “thinking” about it for 106 seconds. The thought process was correct, and the model was even capable of correcting itself after arriving at incorrect (but still logical enough) conclusions.

The accessibility of smaller versions particularly impressed researchers. For context, a 1.5B model is so small, you could theoretically run it locally on a powerful smartphone. And even a quantized version of Deepseek R1 that small was able to stand face-to-face against GPT-4o and Claude 3.5 Sonnet, according to Hugging Face’s data scientist Vaibhav Srivastav.

"DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on math benchmarks with 28.9% on AIME and 83.9% on MATH."

1.5B did WHAT? pic.twitter.com/Pk6fOJNma2

— Vaibhav (VB) Srivastav (@reach_vb) January 20, 2025

Just a week ago, UC Berkeley’s SkyNove released Sky T1, a reasoning model also capable of competing against OpenAI o1 preview.

Those interested in running the model locally can download it from Github or Huggingf Face. Users can download it, run it, remove the censorship, or adapt it to different areas of expertise by fine-tuning it.

Or if you want to try the model online, go to Hugging Chat or DeepSeek’s Web Portal, which is a good alternative to ChatGPT—especially since it’s free, open source, and the only AI chatbot interface with a model built for reasoning besides ChatGPT.

Edited by Andrew Hayward

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Recommended News

Apple’s Top AI Exec Leaves For Meta Amid Aggressive Hiring Trend
Apple’s head of foundation models, Ruoming Pang, has reportedly left the company to join Meta’s Superintelligence Labs, adding to a growing list of high-profile names allegedly poached to bolster Meta's AI ambitions. While Meta formally announced the creation of its Superintelligence Labs in an internal memo last week, Pang’s name was not included on the list. His exit from Apple was first reported by Bloomberg on Monday, citing sources familiar with the matter. The departure deals a significant...
NewsArtificial Intelligence
3 min read
Vince DioquinoJul 8, 2025
Create an account to save your articles.
US Politician’s Tweet Revives Cultural Debate Over UAPs
Are we mistaking angels for aliens? A viral post by U.S. Representative Anna Paulina Luna (R-Fla.) has sparked debate over whether a recent UFO image shows a biblical being instead of an extraterrestrial visitor. The debate began on Sunday when Luna, tweeting from her personal account, replied to a post by user Adrian Dittmann showing a biblically accurate angel. “Actual representation of angels,” Luna wrote on X. “10/10 post.” Actual representation of Angels. 10/10 post. — Anna Paulina Luna...
NewsSpace
3 min read
Jason NelsonJul 8, 2025
Create an account to save your articles.
Here's How All Major AI Platforms Stacked Up in a Harry Potter Sorting Hat Quiz
A computer developer known as Boris the Brave conducted an experiment that placed the 17 major language models through the official Harry Potter house quiz, sampling each question 20 times and calculating the probability of each house assignment. "Perhaps unsurprisingly, the vast majority of models prefer Ravenclaw, with the occasional model branching out to Hufflepuff," Boris wrote in a blog post sharing his results. Eleven out of 17 AI models scored a perfect 100% probability for Ravenclaw—the...
NewsArtificial Intelligence
4 min read
Jose Antonio LanzJul 7, 2025
Create an account to save your articles.

Coin Prices