After a failed attempt to train a new model for chips using Hua to train a new model, the release of a new generation of models also highlights the difficulty of Beijing's promotion of replacing American technology. People familiar with the matt...
After a failed attempt to train a new model for chips using Hua to train a new model, the release of a new generation of models also highlights the difficulty of Beijing's promotion of replacing American technology.
People familiar with the matter revealed that after DeepSeek released the R1 model in January, it was officially encouraged to use Hua as an Ascend processor instead of an NVIDIA system.
However, technical problems were encountered continuously during R2 training with Shengchi wafers, so they changed back to NVIDIA wafers for training, using Hua as the chip only in the inference stage. People familiar with the matter said this is the main reason why the model was postponed from May, causing DeepSeek to fall behind in the competition.
The so-called "training" is that the model is concentrated on learning through a large amount of data, and "reasoning" uses the trained model to predict or generate responses. From this we can see that Chinese chips are still behind the US competitors in key tasks, which is also a challenge facing China in pursuit of semiconductor self-sufficiency.
Foreign media Financial Times (FT) pointed out that Beijing has required Chinese technology companies to provide justified reasons for ordering NVIDIA H20 chips, thereby promoting domestic alternatives such as Cambricon.
Industry insiders believe that compared with NVIDIA products, Chinese chips have disadvantages such as stability problems, slower chip connection speed, and poor software water quality.
It was reported that Hua once sent an engineer team to DeepSeek office to help develop R2 models using its AI chips. But even so, DeepSeek still cannot successfully complete a training session on Sheng Teng chips. However, DeepSeek and Hua changed their cooperation to make the model compatible with Shengteng chips in the inference stage.
People familiar with the matter revealed that DeepSeek founder Liang Wen-hyun expressed his lack of progress towards R2 internally and worked hard to invest more time to create more advanced models to maintain the company's leading position in the AI field. The reason for the delay in R2 release also includes the data mark time required to update the model is longer than expected. According to Chinese media reports, the model is expected to be released in the next few weeks.
Ritwik Gupta, an AI researcher at the University of California, Berkle, believes that the "growth pain" is being used in training, but expects it to be suitable in the end. "We haven't seen a leading model using Hua as training today, which does not mean that it won't happen in the future. This is just a time issue."
DeepSeek’s next AI model delayed by attempt to use Chinese chips Extended reading: In order to publish the new AI technology "UCM", the speed of destruction of HBM and AI reasoning has increased by 90%. Intel demonstrated the "USAI" special show of Love in China, emphasizing deepening US manufacturing and national defense cooperation