The China-specific version of Gaudi 3 needs to significantly reduce its AI performance in order to comply with export regulations.

中国特供版的Gaudi 3需要大幅降低AI性能,才能合规出口。

Recently, media reported that Intel is preparing to launch a "special version" of Gaudi 3 for the Chinese market, including two hardware forms: an OAM-compatible mezzanine card called HL-328 and a PCle accelerator card called HL-388. The report pointed out that Intel disclosed the above information in its Gaudi 3 white paper, in which HL-328 will be launched on June 24 and HL-388 will be launched on September 24.

近日,有媒体称,英特尔准备针对中国市场推出“特供版”Gaudi 3,包括名为HL-328的OAM相容夹层卡和名为HL-388的PCle加速卡两种硬件形态。报道指出,英特尔在其Gaudi 3白皮书中披露了上述信息,其中HL-328将于6月24日推出,HL-388将于9月24日推出。
原创翻译:龙腾网 https://www.ltaaa.cn 转载请注明出处


What is shocking is that based on parameter estimates such as the number of cores, operating frequency, TDP, etc., compared to the Gaudi 3 international version, the performance of China's "special version" HL-328 chip may be reduced by about 92%.

令人震惊的是,基于内核数量、工作频率、TDP等参数估算,相比Gaudi 3国际版,中国“特供版”HL-328芯片性能或降低约92%。

What's different about the China special edition?

中国特供版有什么不同?

In terms of specific hardware specifications, compared with the original version, the China-specific version of Gaudi 3 has the same 96MB SRAM on-chip storage, 128GB HBM2e high-bandwidth storage, a bandwidth of 3.7TB/s, and a PCIe 5.0 x16 interface and decoding standard. However, due to the export control rules of AI chips in the United States, the comprehensive computing performance (TPP) of this type of high-performance AI needs to be lower than 4800 before it can be exported to China. This means that the 16-bit performance of the China-specific version of Gaudi 3 cannot exceed 150. TFLOPS.

具体硬件规格方面,中国特供版的Gaudi 3与原版相比,具有相同的96MB SRAM片上存储,128GB HBM2e高带宽存储,带宽为3.7TB/s,拥有PCIe 5.0 x16介面和解码标准。但是,由于美国对于AI芯片的出口管制规则限制,使得这类高性能AI的综合运算性能(TPP)需要低于4800才能出口到中国, 这意味中国特供版的Gaudi 3的16bit性能不能超过150 TFLOPS。

According to information released by Intel, Gaudi 3 can reach 1835 TFLOPS on FP16/BF16, which is 40% faster in large model training and 50% more efficient in inference than NVIDIA H100.

根据英特尔公布的资料显示,Gaudi 3在FP16/BF16上可以达到1835 TFLOPS,相比英伟达H100在大模型训练方面快40%、推理能效高50%。

Obviously, the China-specific version of Gaudi 3 needs to significantly reduce its AI performance before it can be exported in compliance with regulations. Therefore, the China-specific version of Gaudi 3 needs to significantly reduce the number of cores (the original version has 8 matrix math engines and 64 tensor cores) and operating frequency.

显然,中国特供版的Gaudi 3需要大幅降低AI性能,才能合规出口。因此,中国特供版Gaudi 3需要大幅削减内核数量(原版拥有8个矩阵数学引擎和64 个张量内核)和工作频率。

In July last year, Intel released Gaudi 2 for the Chinese market. Compared with the international version of Gaudi 2, the accelerator cards launched for the Chinese market have little difference in performance, while the number of integrated Ethernet RDMA ports has been reduced from 24 to 21 to comply with US chip export control regulations.

去年7月,英特尔就发布了面向中国市场的Gaudi 2。相比国际版Gaudi 2,面向中国市场推出的加速卡在性能上差别不大,而集成以太网RDMA端口数量从24个端口减到21个,以符合美国芯片出口管制规定。

How the United States hijacks computing power

美国是如何劫持计算能力

In the 1990s, the United States accounted for more than one-third of global chip production, a share that had dropped to about 12% in 2020. In order to maintain its leading position in the semiconductor field, since the United States issued the "CHIPS and Science Act" (hereinafter referred to as the "Chip Act") in August 2022, the United States has implemented comprehensive semiconductor export controls on China. From the chips themselves to the chip manufacturing equipment, restrictions are constantly escalating.

20世纪90年代,美国占全球芯片产量的三分之一以上,这一份额到2020年已降至12%左右。为了维护半导体领域的领先地位,自2022年8月美国发布《芯片和科学法案》(CHIPS and Science Act,下称“《芯片法案》”)以来,美国对中国实施了全面的半导体出口管制,从芯片本身到芯片制造设备,限制措施不断升级。

The CHIP Act is the centerpiece of the Biden administration's industrial revitalization policy, which uses U.S. government funds to restore domestic production of technology components critical to national security and economic growth. The bill bans subsidized U.S. and allied partner companies from building or expanding advanced process chip factories in China and other concerned countries for ten years.

《芯片法案》是拜登政府复兴产业政策的核心,其利用美国政府资金恢复对国家安全和经济增长至关重要的技术部件的国内生产。该法案禁止获得补贴的美国及其盟友伙伴的企业十年内在中国和其他关切的国家新建或扩大先进制程芯片厂。

In October 2022 and October 2023, the U.S. Department of Commerce's Bureau of Industry and Security (BIS) issued export controls on China's advanced semiconductors and computing equipment twice in an attempt to affect China's advanced manufacturing, and Nvidia, AMD, Intel Many of its GPU and AI chip products can no longer be exported to China, and even the high-end gaming graphics card RTX 4090 has been restricted.

2022年10月、2023年10月,美国商务部工业和安全局(BIS)连续两次发布对中国的先进半导体和计算设备的出口管制,企图让中国先进制造受影响,并且英伟达、AMD、英特尔的多款GPU和 AI 芯片产品已不能再出口到中国,就连高端游戏显卡RTX 4090都受到了限制。

In December 2023, the U.S. Department of Commerce BIS announced the launch of an investigation into the semiconductor supply chain at mature process nodes, and it was explicitly targeting the Chinese chip semiconductor industry.

2023年12月,美国商务部BIS宣布启动对成熟制程节点的半导体供应链展开调查,更是明晃晃地针对中国芯片半导体产业。
原创翻译:龙腾网 https://www.ltaaa.cn 转载请注明出处


In the early morning of March 30 this year, Hong Kong time, the Bureau of Industry and Security (BIS) under the U.S. Department of Commerce issued new regulations and measures to "implement additional export controls", revising the two new export restrictions formulated by BIS in October 2022 and 2023. Regulations comprehensively restrict the sales of Nvidia, AMD and more advanced AI chips and semiconductor equipment to China.

北京时间今年3月30日凌晨,美国商务部下属的工业与安全局(BIS)发布“实施额外出口管制”的新规措施,修订了BIS于2022、2023年10月制定的两次出口限制新规,全面限制英伟达、AMD以及更多更先进 AI 芯片和半导体设备向中国销售。

In this new regulation, the big stick of sanctions is waved again. BIS has dexed and revised some restrictions on the sales of semiconductor products to China from the United States, Macau, China and other places, including that Macau, China and the D:5 country group will adopt a "presumptive denial policy", and AI semiconductor products exported by the United States to China will be subject to "Case-by-case review" policy rules, including comprehensive inspection of technical level, customer identity, compliance plan and other information.

此次新规中,制裁大棒再次挥舞。BIS删除和修订了部分关于美国、中国澳门等地对华销售半导体产品的限制措施,包括中国澳门和D:5国家组将采取“推定拒绝政策”,并且美国对中国出口的 AI 半导体产品将采取“逐案审查”政策规则,包括技术级别、客户身份、合规计划等信息全面查验。

Where does Intel's courage come from?

英特尔的勇气来自哪里?

Although it is not yet on the market, Intel's special version of Gaudi 3 is very likely to bring some potential problems. For example, reduced performance may affect the user experience and application effects of Chinese enterprises; at the same time, if the special version of the chip does not have a price advantage, its market competitiveness may be affected to a certain extent. Therefore, Intel needs to make reasonable trade-offs in product design and pricing.

虽然还未上市,但英特尔的特供版Gaudi 3极有可能带来一些潜在的问题。例如,性能降低可能会影响中国企业用户体验和应用效果;同时,如果特供版芯片在价格上没有优势,那么其市场竞争力可能会受到一定影响。因此,英特尔需要在产品设计和定价等方面做出合理的权衡。
原创翻译:龙腾网 https://www.ltaaa.cn 转载请注明出处


Two months ago, Nvidia's "special edition" AI chip H20 terminal products for China were available for pre-order. Product forms include computing cards and servers equipped with 8 H20 computing cards. From a performance point of view, the performance of Nvidia H20 is about one-sixth that of H100, but the price has not been significantly reduced, so the price/performance ratio is not high.

两个月前,英伟达对华“特供版”AI芯片H20的终端产品已可接受预订。产品形态包括计算卡和搭载8张H20计算卡的服务器。从性能上来看,英伟达H20性能约为H100的六分之一,但价格并未显著降低,因此性价比并不高。

At the beginning of this year, according to people familiar with the matter, large Chinese companies such as Alibaba and Tencent have been testing special chip samples from Nvidia since November last year. They have told Nvidia that the number of chips they order from Nvidia this year will be far less than the previously planned purchase of Nvidia's high-performance chips that have been banned.

今年年初,据知情人士透露,自去年11月以来,阿里巴巴、腾讯等中国大型企业一直在测试英伟达的特供芯片样本。他们已向英伟达表明,今年向英伟达订购的芯片数量将远远少于此前原计划购买的、已经被禁的英伟达高性能芯片。

Even though it faces the risk of revenue decline, Intel is still doing well under "prudent budgeting". Nearly two years after the U.S. government's Chip Act was launched, veteran chip giant Intel announced in March that it had received up to $8.5 billion in government subsidies and up to $11 billion in special loan support. It is understood that the subsidy support Intel receives comes from the "Chip Act" introduced by the Biden administration in 2022. This bill strives to help chip companies build more chip factories in the United States and build the United States into a chip manufacturing power. Intel is currently said to be It is the biggest beneficiary in the context of "chip manufacturing returning to the United States."

即便面临营收下滑风险,但是英特尔依旧在“精打细算”下过得不错。在美国政府《芯片法案》推出近2年后,老牌芯片巨头英特尔3月份宣布获得高达85亿美元的政府补贴以及多达110亿美元的特殊贷款支持。据了解,英特尔所获得的补贴支持来自于2022年拜登政府所出台的《芯片法案》,该法案力争帮助芯片公司在美国建造更多的芯片工厂,将美国打造为芯片制造强国,英特尔目前可谓是“芯片制造业回流美国”这一背景下的最大受益者。

From the perspective of the AI ​​market, NVIDIA currently occupies an absolute advantage in the chip market, and it is not easy for Intel to use its products to gain share. Wells Fargo statistics show that Nvidia currently has 98% market share in the data center AI market, while AMD's market share is only 1.2%, and Intel's is less than 1%. Therefore, for Intel, following the US government is a wise move to protect itself.

从AI市场看,目前英伟达在芯片市场占据着绝对优势,英特尔希望用产品撬走份额并不容易。富国银行统计显示,目前英伟达在数据中心AI市场拥有98%的市场份额,而AMD公司的市场份额仅有1.2%,英特尔则只有不到1%。因此对于英特尔来说,紧跟美国政府反而是明哲保身之举。

Computing power is in short supply, China substitution is underway

计算能力短缺,中国正在进行替代

Computing power is the productivity of the big data era. With the rapid development of the digital economy, especially the explosion of AI, the demand for computing power in the entire society is growing rapidly. According to the "China Artificial Intelligence Computing Power Development Assessment Report 2023-2024" jointly launched by IDC and Inspur Information, during the period 2022-2027, the compound annual growth rate of China's intelligent computing power is expected to reach 33.9%. The scale reaches 1117.4 EFLOPS.

算力是大数据时代的生产力,伴随数字经济的高速发展,特别是AI的爆发,整个社会对算力的需求呈现快速增长态势。据IDC和浪潮信息联合推出的《2023-2024年中国人工智能计算力发展评估报告》显示,2022-2027年期间,预计中国智能算力规模年复合增长率达33.9%,到2027年智能算力规模达1117.4 EFLOPS。

At the same time, staff from the Southern Branch of the China Academy of Information and Communications Technology stated at CITE 2024 that China currently accounts for more than 30% of the world's intelligent computing power, mainly relying on the U.S. NVIDIA GPU chips, and the share of domestic independent computing power is only 5%. The usage rate of American AI frxworks such as TensorfiowPyTorch and Caffe exceeds 90%.

与此同时,中国信息通信研究院南方分院的工作人员在CITE 2024上表示,目前中国智能算力全球占比超30%,主要依赖美国英伟达GPU芯片,国产自主算力份额仅为5%,国内TensorfiowPyTorch、Caffe等美国AI框架使用率超过90%。

From an application perspective, domestic mainstream chip manufacturers such as Shengteng, Cambrian, and Tianshu Zhixin have completed the adaptation of mainstream large models. Industry analysts believe that although there is still a big gap compared with the advanced chips of Nvidia and AMD, domestic GPU chips such as the Ascend 910 series can basically support domestic large-model applications. Liu Qingfeng, chairman of iFlytek, said in 1024 Developers last year It was stated at the festival that Huawei's GPU capabilities have been comparable to NVIDIA A100, and it has launched the "Flying Star One" large-model computing platform based on the Ascend ecosystem. Previously, the Cambrian Jisiyuan (MLU) series of cloud smart accelerator cards and the "Zhixiang Multi-modal Large Model" self-developed by Zhixiang Future have also been adapted. It claims to have reached international standards in terms of product performance and image quality. mainstream product level.

从应用上来看,目前中国国内如昇腾、寒武纪、天数智芯等主流芯片厂商已完成对主流大模型的适配。业内分析认为,虽然相较于英伟达、AMD的先进芯片还有很大差距,但昇腾910系列等国产GPU 芯片目前基本可以支撑国内的大模型应用,科大讯飞董事长刘庆峰在去年1024 开发者节上曾表示,华为的GPU能力已能对标英伟达A100,并基于昇腾生态推出了“飞星一号”大模型算力平台。而在此前,寒武纪思元系列云端智能加速卡与智象未来自研的“智象多模态大模型”也已完成适配,其声称在产品性能和图像质量方面均达到了国际主流产品的水平。

China's process of large-scale substitution of imported AI chips is accelerating. For Intel, the key is how to meet U.S. policy requirements while taking into account the needs of the Chinese market and maintain product competitiveness and large customer experience. On the other hand, this also provides valuable development opportunities for China's local AI chip manufacturers. These manufacturers need to pay close attention to market dynamics and technology development trends to cope with potential competitive pressures.

中国大规模替代进口AI芯片的进程正在加速。对于英特尔来说,关键在于如何在满足美国政策要求的同时,兼顾中国市场需求,保持产品的竞争力和大客户体验。另一方面,这也为中国本土的AI芯片厂商提供了发展的宝贵机遇,这些厂商需要密切关注市场动态和技术发展趋势,以应对潜在的竞争压力。