DeepSeek’s AI Breakthrough: Cheaper, Faster, but at What Cost?

February 11, 2025

Chinese startup DeepSeek recently made quite the splash in tech news with their large language model (LLM) DeepSeek-R1. This open-source model is quickly catching up in capabilities to popular models, such as those from OpenAI. DeepSeek claims that DeepSeek-R1 was trained at a far lower cost than other models.

What does it all mean?

Defining Terms: What is a Model?

According to IBM, “an AI model is a program that has been trained on a set of data to recognize certain patterns and make certain decisions without human intervention.” It works by using algorithms against large amounts of data.

A large language model (LLM) is an AI model that has been trained on a huge amount of text from books, websites, and other sources to understand and generate human-like text. You’ve probably interacted with an LLM by asking it questions or giving it prompts. The LLM can respond with relevant information, write essays, have conversations, and more. It's been programmed to understand the context and nuances of language to provide helpful and coherent answers.

IBM explains it like this: LLM models are “defined by [their] ability to autonomously make decisions or predictions, rather than simulate human intelligence."

Examples of popular LLMs include OpenAI’s GPT-3 and -4, Google’s BERT and T3, Facebook’s RoBERTa, and DeepSeek-R1.

What is a GPU?

GPUs, or graphical processing units, are hardware processing cards that were originally developed to enhance graphics for video games and virtual desktops. Someone realized that the intense math required to deliver stellar graphics was similar to the math data scientists did to create AI models. This helped speed up the systems hosting the models, which led to performance improvements.

Not surprisingly, the prospect of training a model that required fewer GPUs shook up the industry! In fact, the industry was so shaken that NVIDIA lost nearly $600B in market value the day DeepSeek made their announcement.

What's Behind the Excitement about DeepSeek?

One major reason for excitement over the new model is that DeepSeek claims to have built an open-source model that does what Open AI’s models do, but at a fraction of the cost. To be precise, the company claimed it only cost them $6 million dollars and 2,046 GPUs to train. For comparison, the cost of GPT-4 was estimated to be between $50 - $100 million.

GPUs are one of the biggest expenses required to train models. Additionally, since everyone is racing to be part of the AI revolution, GPUs are hard to find, as vendors like NVIDIA are having a hard time keeping up with demand.

On October 7, 2022, export controls on the sale of GPUs to China were put into effect. That meant that DeepSeek needed to find a way to train a model without leaning on hardware accelerators. And they figured it out with some ingenious computer science.

Technical Achievements Unlocked

DeepSeek also caused a stir because of technical achievements that allowed the team to train their model with fewer GPUs. According to The Register, DeepSeek R1 was fine-tuned from their V3 model that was released at the end of last year.

The team mixed up how the model was trained.

Chain of Thought (CoT) prompting: This requires the model to show the steps it goes through to answer a prompt. Showing this “reasoning” makes the model’s responses more reliable, more transparent, and easier to debug.

Reinforcement learning. This is a machine learning technique. As the software makes decisions, it’s rewarded for taking actions that work toward your goal. Actions that aren’t useful are ignored.

Model Distillation: This is another machine learning technique. It transfers knowledge from a bigger system (the teacher model) to a smaller model (the student model). This gives the student model all of the lessons learned without having to make the training investment made with the larger teacher model.

Mind the Hype

When you’re evaluating AI models, you have to mind the hype. The most obvious place to start is the $6 million cost DeepSeek claims it took to train this model. That is a 94% decrease!

The independent research and analysis firm Semianalysis looked at the probable total cost to train this model. They don’t believe the $6M price tag, saying: “This is akin to pointing to a specific part of a bill of materials for a product and attributing it as the entire cost.”

They believe that $6M was only for the pre-training number, and that the hardware spend has surpassed $500M over the company history. Additionally, there is the cost of the teams who spent months developing and testing the new ideas and configurations for the new model to consider.

Dangers of this Model

There are a few things to be wary of if you plan to use this model. First of all, since the model was trained with distillation, that means it was trained with synthetic data. Is that data correct? Probably as correct as any LLM can be.

Also remember that DeekSeek is a Chinese company. This means it is subject to its country’s laws and regulations. Because of that, it can’t answer questions about the Tiananmen Square massacre or the Hong Kong pro-democracy protests, for instance.

While DeepSeek, Meta, and OpenAI say they collect data from account information, activities on the platform, and devices they are using, DeepSeek “also collects keystroke patterns or rhythms, which can be as uniquely identifying as a fingerprint or facial recognition and used a biometric.”

Things to Ponder…

The DeepSeek announcement was something to get excited about. A company has figured out how to train their model faster and more efficiently, making it more affordable. However, don’t get caught up in the hype. Dig into statements that seem improbable – they probably are focusing on just one part of the story. If you don’t understand the vocabulary, look up the words or ask an expert. Here is an overview of how LLMs work from a session I presented at VMworld.

Always remember that AI is just computer science, and not magic!