We finally know how much it cost to form the amazing Deepseek model of China

Remember when Deepseek briefly rocked the entire artificial intelligence industry by launching its large language model, R1, which was formed for a fraction of money that Optaai and other large actors poured into their models? Thanks to a new article published by the Deepseek AI team in the newspaper NatureWe finally know what it took to train Deepseek 1: $ 294,000 and 512 NVIDIA H800 Chips. The reason why he was able to spend less, it seems, is due to the use by the team of strengthening learning techniques based on tests and errors.
Most AI models responsible for performing reasoning tasks must be trained on data and demonstrations annotated by humans to “learn” how to solve certain problems, which is both expensive and which takes time on a scale as models receive more difficult tasks. Deepseek found that he could improve the reasoning and outputs of his model simply by inciting him to carry out a test and error process until he obtained the right answer.
In an article accompanying the newspaper, the assistant professor of Carnegie Mellon University, Daphne Ippolito and the doctoral student, Yiming Zhang, explain the reinforcement method by comparing it to a child playing a video game: “While the child navigates their avatar through the world of the game, they learn by trials and errors that certain actions (such as the collection of gold coins) similarity, in a similarity. Vein, Deepseek-R1 received a high score when he answered the questions correctly and a weak score when he gave bad answers. »»
Previous research has shown that the use of an incentive approach – aspect of an LLM to provide a step by step explanation of how it is out – provides more precise responses. But the Deepseek team found a way to get better answers by strengthening by assigning a rating system to the outputs produced by R1. This works particularly well with mathematics and programming issues, which generally have a correctly correct answer. Using this method instead of a reasoning guided by man, the LLM was able to reach a correct conclusion in itself because it was looking for the upper scores.
Although the outings of this method seem more precise, it also obscures the process of “thought” of the machine a little more for humans who try to follow. Invited to produce a reasoning track for its response, the model sometimes moved between English and Chinese. He also produced explanations that were 10,000 words or more. The method was also only particularly functional for responses with clear or bad responses rather than more nuanced or subjective prompts.
Anyway, it is an interesting window on how Deepseek managed to be competitive with a smaller budget. However, the company itself has a lot of skepticism that surrounds it because of its perceived proximity to the Chinese government. Most recently, researchers have shown Washington Post that the company model would refuse to produce code with major security defects when the prosecutor indicates that they work with groups considered to be sensitive by the Chinese government. The researchers also found that the model spat a less safe code when asked to produce work for Tibet, Taiwan, the Falun Gong religious movement or the Islamic State.
https://gizmodo.com/app/uploads/2025/01/DeepSeekSuckDown-1200×675.jpg