中国深度求索(DeepSeek)公司表示,其热门人工智能模型的训练成本仅为 29.4 万美元。
China's DeepSeek says its hit AI model cost just $294,000 to train
译文简介
中国人工智能开发商深度求索(DeepSeek)表示,其 R1 模型的训练成本为 29.4 万美元,远低于美国竞争对手公布的数字。该数据出自一篇论文,这篇论文可能会重新引发关于中国在人工智能发展竞赛中地位的讨论。
正文翻译
BEIJING, Sept 18 (Reuters) - Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence.
The rare upxe from the Hangzhou-based company - the first estimate it has released of R1's training costs - appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia (NVDA.O), opens new tab.
Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product upxes.
The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. A previous version of the article published in January did not contain this information.
Training costs for the large-language models powering AI chatbots refer to the expenses incurred from running a cluster of powerful chips for weeks or months to process vast amounts of text and code.
Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that the training of foundational models had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
Some of DeepSeek's statements about its development costs and the technology it used have been questioned by U.S. companies and officials.
The H800 chips it mentioned were designed by Nvidia for the Chinese market after the U.S. in October 2022 made it illegal for the company to export its more powerful H100 and A100 AI chips to China.
U.S. officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after U.S. export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s.
In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development.
"Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip cluster of H800 chips, they added.
Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
MODEL DISTILLATION
DeepSeek also responded for the first time, though not directly, to assertions from a top White House adviser and other U.S. AI figures in January that it had deliberately "distilled" OpenAI's models into its own.
北京,9 月 18 日(路透社)—— 中国人工智能开发商深度求索(DeepSeek)表示,其 R1 模型的训练成本为 29.4 万美元,远低于美国竞争对手公布的数字。该数据出自一篇论文,这篇论文可能会重新引发关于中国在人工智能发展竞赛中地位的讨论。
这家总部位于杭州的公司此前鲜有此类动态发布,此次公布的 R1 模型训练成本也是其首次披露。该数据见于周三发表在学术期刊《自然》上的一篇同行评审文章中。
今年 1 月,深度求索曾发布据称成本更低的人工智能系统。消息一出,全球投资者纷纷抛售科技股,因为他们担心这些新模型可能会威胁到英伟达等人工智能领军企业的主导地位。
自那以后,除发布少数几款新产品更新外,深度求索及其创始人梁文峰基本淡出了公众视野。
这篇《自然》论文将梁文峰列为合著者之一,文中提到,深度求索这款主打推理功能的 R1 模型,训练成本为 29.4 万美元,训练过程使用了 512 块英伟达 H800 芯片。而该论文在今年 1 月首次发表的版本中并未包含这些信息。
支撑人工智能聊天机器人运行的大型语言模型,其训练成本指的是:运行一组高性能芯片数周或数月,以处理海量文本和代码所产生的费用。
美国人工智能巨头 OpenAI 的首席执行官山姆・奥特曼曾在 2023 年表示,基础模型的训练成本 “远高于” 1 亿美元,不过该公司尚未公布任何一款模型的详细成本数据。
深度求索关于其开发成本及所用技术的部分说法,已受到美国企业和官员的质疑。
该公司提到的 H800 芯片,是英伟达专为中国市场设计的产品。此前在 2022 年 10 月,美国出台规定,禁止英伟达向中国出口性能更强的 H100 和 A100 两款人工智能芯片,H800 芯片由此应运而生。
今年 6 月,美国官员向路透社透露,深度求索能够获取 “大量” H100 芯片,且这些芯片是在美国实施出口管制后采购的。英伟达当时则对路透社表示,深度求索使用的是合法采购的 H800 芯片,而非 H100 芯片。
在《自然》论文附带的补充信息文件中,深度求索首次承认该公司确实拥有 A100 芯片,并表示这些芯片用于研发的准备阶段。
研究人员在文件中写道:“在 DeepSeek-R1 的研究过程中,我们使用 A100 GPU 为小型模型的实验做准备。” 他们补充称,在这一初始阶段之后,研发团队使用由 512 块 H800 芯片组成的集群,对 R1 模型进行了总计 80 小时的训练。
路透社此前曾报道,深度求索之所以能吸引中国顶尖人才,原因之一在于它是国内少数拥有 A100 超级计算集群并投入运营的企业。
模型提炼
今年 1 月,美国白宫一位高级顾问及其他美国人工智能领域人士曾声称,深度求索蓄意将 OpenAI 的模型 “提炼” 到自家模型中。此次,深度求索虽未直接回应,但也首次对此类说法作出了回应。
The rare upxe from the Hangzhou-based company - the first estimate it has released of R1's training costs - appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia (NVDA.O), opens new tab.
Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product upxes.
The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. A previous version of the article published in January did not contain this information.
Training costs for the large-language models powering AI chatbots refer to the expenses incurred from running a cluster of powerful chips for weeks or months to process vast amounts of text and code.
Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that the training of foundational models had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
Some of DeepSeek's statements about its development costs and the technology it used have been questioned by U.S. companies and officials.
The H800 chips it mentioned were designed by Nvidia for the Chinese market after the U.S. in October 2022 made it illegal for the company to export its more powerful H100 and A100 AI chips to China.
U.S. officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after U.S. export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s.
In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development.
"Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip cluster of H800 chips, they added.
Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
MODEL DISTILLATION
DeepSeek also responded for the first time, though not directly, to assertions from a top White House adviser and other U.S. AI figures in January that it had deliberately "distilled" OpenAI's models into its own.
北京,9 月 18 日(路透社)—— 中国人工智能开发商深度求索(DeepSeek)表示,其 R1 模型的训练成本为 29.4 万美元,远低于美国竞争对手公布的数字。该数据出自一篇论文,这篇论文可能会重新引发关于中国在人工智能发展竞赛中地位的讨论。
这家总部位于杭州的公司此前鲜有此类动态发布,此次公布的 R1 模型训练成本也是其首次披露。该数据见于周三发表在学术期刊《自然》上的一篇同行评审文章中。
今年 1 月,深度求索曾发布据称成本更低的人工智能系统。消息一出,全球投资者纷纷抛售科技股,因为他们担心这些新模型可能会威胁到英伟达等人工智能领军企业的主导地位。
自那以后,除发布少数几款新产品更新外,深度求索及其创始人梁文峰基本淡出了公众视野。
这篇《自然》论文将梁文峰列为合著者之一,文中提到,深度求索这款主打推理功能的 R1 模型,训练成本为 29.4 万美元,训练过程使用了 512 块英伟达 H800 芯片。而该论文在今年 1 月首次发表的版本中并未包含这些信息。
支撑人工智能聊天机器人运行的大型语言模型,其训练成本指的是:运行一组高性能芯片数周或数月,以处理海量文本和代码所产生的费用。
美国人工智能巨头 OpenAI 的首席执行官山姆・奥特曼曾在 2023 年表示,基础模型的训练成本 “远高于” 1 亿美元,不过该公司尚未公布任何一款模型的详细成本数据。
深度求索关于其开发成本及所用技术的部分说法,已受到美国企业和官员的质疑。
该公司提到的 H800 芯片,是英伟达专为中国市场设计的产品。此前在 2022 年 10 月,美国出台规定,禁止英伟达向中国出口性能更强的 H100 和 A100 两款人工智能芯片,H800 芯片由此应运而生。
今年 6 月,美国官员向路透社透露,深度求索能够获取 “大量” H100 芯片,且这些芯片是在美国实施出口管制后采购的。英伟达当时则对路透社表示,深度求索使用的是合法采购的 H800 芯片,而非 H100 芯片。
在《自然》论文附带的补充信息文件中,深度求索首次承认该公司确实拥有 A100 芯片,并表示这些芯片用于研发的准备阶段。
研究人员在文件中写道:“在 DeepSeek-R1 的研究过程中,我们使用 A100 GPU 为小型模型的实验做准备。” 他们补充称,在这一初始阶段之后,研发团队使用由 512 块 H800 芯片组成的集群,对 R1 模型进行了总计 80 小时的训练。
路透社此前曾报道,深度求索之所以能吸引中国顶尖人才,原因之一在于它是国内少数拥有 A100 超级计算集群并投入运营的企业。
模型提炼
今年 1 月,美国白宫一位高级顾问及其他美国人工智能领域人士曾声称,深度求索蓄意将 OpenAI 的模型 “提炼” 到自家模型中。此次,深度求索虽未直接回应,但也首次对此类说法作出了回应。
评论翻译
很赞 ( 17 )
收藏
所以DeepSeek训练出一个顶尖人工智能模型,成本居然还不到俄亥俄州一套房子的价格?兄弟,我的学生贷款都比这玩意儿贵。
KR4T0S Its better to say things will cost ten times as much as they actually do when you are trying to fool a government full of idiots to give you money though.
不过,要是想糊弄一群蠢货扎堆的政府、从他们那儿骗钱,那最好还是把成本往实际金额的十倍上吹。
petr_bena ten thousand times more as it seems
远超表象,万倍之巨。
MayorMcCheezz They used 512 H800 nvidia gpus. They go for like 70k new. Ebay listing for 39k for a used one.
他们使用了 512 块英伟达H800 GPU。这款显卡全新的售价约为 7 万美元,而在 eBay 上,二手的挂牌价为 3.9 万美元。
caesar_7 That's why - they've sold them afterwards for than they bought them! They almost made a profit and then Deepseek training cost would be negative :D
这就是原因所在 —— 他们后来把(这些显卡)卖掉了,而且卖价比买价还高! 他们几乎都能从中获利了,这样一来,DeepSeek的训练成本甚至可能变成负数
Mundane_Baker3669 Yeah you have way too much purchasing power than most other people in the world
是啊,你的购买力比世界上大多数人都要强得多。
lucellent It's cheaper to train when your dataset comes from other top AI models, like ChatGPT.
如果你的数据集来自 ChatGPT 等其他顶尖人工智能模型,那么训练成本会更低。
Hi_Im_Dadbot So, those models stole data from everyone else to train them and then someone else came along and stole from them for training? How unfair!
所以,那些模型先是盗用了所有人的数据来训练自己,现在又有人过来盗用它们的数据去训练模型?这也太不公平了!
Thoughtulism It's just grifting all the way down.
说到底,这从头到尾都是一场骗局。
Despeao Obviously it's unfair. Where is the Honor among Thieves ? A pirate shouldn't steal from another /s
显然这很不公平。“盗亦有道” 何在?海盗总不该偷同行吧(反讽)
ComprehensiveSwitch I don’t think that’s the takeaway here. The maintenance areas at is that synthetic data is largely better than organic data, and you can get much better results from using it.
我认为这并非此处要传达的核心信息。关键在于,合成数据在很大程度上优于原始数据,而且使用合成数据能获得好得多的结果。
DeathUponIt Where did you go and what did you go for?
你去了哪里?去做什么了?
Actual__Wizard Actually, the Ohio AI models are even cheaper than the Chinese ones. Which makes complete sense actually.
实际上,俄亥俄州的人工智能模型甚至比中国的还要便宜。其实这完全说得通。
togilvie Yeah this is why the Chinese government is stepping in to subsidize Huawei chips and push folks away from NVIDIA. Seems legit.
是啊,这就是中国政府为何要介入,为华为芯片提供补贴,并推动相关方减少对英伟达芯片依赖的原因。这做法似乎合情合理。
TotallyNotABob Wait a sec, hauwei is making their own chips now? IDK why but I was under the impression they were still using TSCM products.
等一下,华为现在开始自己生产芯片了吗?我不知道为什么,但我之前一直以为他们还在使用台积电代工的产品。
Arcosim China's EUV machine is currently entering trial production right now. So give them another 4 to 5 years to iron things out, solve problems, then another year for entering mass production, then another year from integration and training, and in 7-8 years from now China will be completely semiconductor independent. Basically the only country in the world with a full end-to-end semiconductor chain, from mining and processing the raw materials, to building the lithography, to design and fab the chips, to package and integrate them. Edit: In short, the same scenario that happened with solar panels, and then with EVs, will happen with semiconductors in a few years. Expect GPUs coming from China with similar or even better performance metrics to cost a small fraction of Nvidia and AMD GPUs.
目前,中国的极紫外光刻机(EUV)已进入试生产阶段。若再给中国 4 至 5 年时间解决现存问题、完善技术,之后还需 1 年时间实现量产,再用 1 年完成技术整合与应用磨合,那么从现在起 7 至 8 年后,中国将实现半导体产业的完全自主可控。届时,中国基本会成为全球唯一拥有完整半导体端到端产业链的国家 —— 从原材料的开采与加工,到光刻机的制造,再到芯片的设计与晶圆制造,最后到芯片的封装与集成,整个流程均能自主完成。 补充说明:简而言之,未来几年内,半导体产业将重现太阳能电池板、电动汽车产业曾经历的发展历程(即中国实现技术突破并占据优势)。届时,预计中国生产的GPU,在性能指标与英伟达、AMD产品相当甚至更优的情况下,价格仅为后两者产品的一小部分。
david1610 Good, I want a few sota GPUs for my own curiosity. However they are like $5k each on the low end and a few years old
好的,我出于个人好奇,想要几块当前最先进的GPU。但这些显卡哪怕是入门款,单价也得 5000 美元左右,而且还是几年前的旧型号了。
June1994 But this will destroy America.
但这会摧毁美国。
david1610 I'm not American, however visit regularly. It'll only be bad for stock owners, really, and a few who work in the industry. I'd be more worried about peoples 401k getting rolled up into something ultimately unprofitable. The sooner it pops the better for everyone involved if it can't back up the valuations.
我不是美国人,但会定期去美国。说到底,这件事只会对股东不利,顶多再影响到一些行业从业者。 我更担心的是,人们的 401K 养老金会被套进最终无法盈利的项目里。如果这些的估值无法支撑,那么泡沫越早破裂,对所有相关人员来说反而越好。
June1994 I was being sarcastic. Yeah, I hope Huawei succeeds so we can get cheaper gaming GPUs.
我刚才是在反讽。 是啊,我真心希望华为能成功,这样我们就能买到更便宜的游戏显卡了。
finallytisdone Huawei has chips manufactured by China’s SMIC now. They are trying to claim that they are at the same technology level as TSMC/Nvidia’s leading egde chips (dubious), but SMIC doesn’t have access to EUV technology. They are making these small feature sizes with multiple patterning which means a much lower yield and much higher cost.
现在华为的芯片由中国的中芯国际制造。华为试图宣称这些芯片的技术水平与台积电、英伟达的尖端芯片相当(这一点存疑),但中芯国际并未掌握极紫外光刻机技术。中芯国际通过多重曝光技术实现了这种小尺寸工艺,而这意味着芯片的良率会低得多,成本也会高得多。
joninco Never doubt Chinas ability to manufacture shit… they’ll figure it out, only a matter of time.
别怀疑中国造东西的能力…… 他们总能搞定,只是时间问题而已。
tommos Actually Huawei explicitly says that they are behind on single chip to chip performance but can achieve or exceed Nvidia performance with better networking tech when a bunch of chips are arranged in a node.
实际上,华为明确表示,其单芯片性能目前仍落后,但通过更先进的网络技术,将多枚芯片组成一个节点集群后,整体性能能够达到甚至超过英伟达。
bjran8888 Your information is outdated. Huawei's Ascend chips have long been domestically produced. Interestingly, not a single American here has heard of Cambricon. Please search.
你的信息过时了。华为的昇腾芯片早已实现国产化。 有趣的是,这里没有一个美国人听说过寒武纪。 你可以搜一搜。
TotallyNotABob Should I clap too?
那我该鼓掌吗?
bjran8888 Why not? From the perspective of human progress, humanity as a whole has become more powerful. From an individual standpoint, you'll be able to buy cheaper, more cost-effective chips in the future. Isn't that a good thing?
为什么不呢? 从人类进步的角度来看,人类整体的能力已经变得更强了。 从个人角度来说,未来你能买到更便宜、性价比更高的芯片。 这难道不是一件好事吗?
MarcoGWR Not accurate. It's SMIC. Actually China now can make its own domestic 5~7nm chip by all China's equipments.
不准确。应该是中芯国际。 事实上,中国现在已经能够用全套国产设备制造自主研发的 5 至 7 纳米芯片了。
ImportantCommentator They didnt say they couldnt do more with better gpus...
他们没说有了更好的GPU,就无法实现更多功能……
TopTippityTop From what I read they didn't factor the cost of purchasing the GPU, which is the major sum.
根据我了解到的信息,他们没有把购买GPU的成本算进去,而这部分成本占比很大。
simulated-souls Most teams rent GPUs instead of buying them, so the rental cost is what matters. GPU rentals are around 3 dollars an hour in America, so if you take their numbers (512 GPUs x 80 hours), you get 122K, which is in the ballpark.
大多数团队会租赁GPU而非购买,因此租赁成本才是关键。 在美国,GPU 的租赁费用约为每小时 3 美元。若按他们给出的数据(512 块 GPU × 80 小时)计算,得出的费用约为 12.2 万美元,这个结果是比较合理的。
teethgrindingaches Why would they factor in the purchase cost when the number being reported is specifically the training cost? That's standard practice in the industry; Anthropic for example said theirs cost "a few tens of millions" to train.
既然所公布的数字明确是训练成本,他们为何还要把购买成本算进去呢?这在行业内是标准做法,比如 Anthropic(译者注:Anthropic是一家位于美国加州旧金山的人工智能股份有限公司)就曾表示,其模型的训练成本为 “数千万美元”。
pain_au_choc0 Yeah, like 15m worth of gpus. Also many misleading data sets here, somewhere I found this as i didnt read the paper: The paper stated that DeepSeek used 512 Nvidia H800 chips to train the reasoning-focused model over a period of 80 hours. So 80h was enough for trial and error? Yeah sure
是啊,大概相当于 1500 万美元的GPU。 而且这里还有很多误导性的数据集,我没读过那篇论文,但在别处看到了这样的信息: 该论文称,DeepSeek使用 512 块英伟达H800 芯片,耗时 80 小时训练出了这款聚焦推理能力的模型。 所以 80 小时就足够用来反复试验了?呵呵,怕是不太可能吧。
lancelongstiff At $0.10 per KW/h that would've cost $1,436 to run 512 of those GPUs for 80 hours. So I'm guessing some the remaining $292,500 went on leasing the GPU time.
按每千瓦时(KW/h)0.10 美元的电价计算,运行 512 块这样的GPU80 小时,电费成本约为 1436 美元。 因此我推测,剩余的 29.25 万美元,有一部分可能用于租赁 GPU 的使用时长。
RogueHeroAkatsuki Probably they didnt factor a lot of other costs too. Just like buying new house and claiming you bought it for cost of new lock to door.
他们或许还遗漏了很多其他成本没算进去。这就好比买了一套新房子,却宣称自己只花了换门锁的钱。
-oshino_shinobu- People clearly don’t know what’s going on. Companies don’t necessarily buy GPUs, they rent them.
很明显,人们根本不清楚实际情况。企业不一定会购买GPU,他们更多是选择租赁。
bobartig That figure also leaves out pretraining. Nobody in the AI community thinks the $300k is correct. If anything, it's one RL post-training run after spending $6,000,000 for pretrain on a $1,300,000,000 datacenter. That also leaves out the cost of top-flight data scientists and AI Researchers. Someone remind me, are those cheap to come by these days?
这个数字还漏掉了预训练成本。人工智能领域没人认为 30 万美元这个金额是准确的。实际上,这 30 万美元可能只是一次强化学习(RL)后续训练的成本,而在此之前,光是在一个造价 1300 万美元的数据中心里进行预训练,就已经花了 600 万美元。 此外,这还没算上顶尖数据科学家和人工智能研究员的人力成本。谁来提醒我一下,如今这些人才的薪资很便宜吗?
Getafix69 I still think Deepseek gives better answers most of the time as well but that could just be my bias at seeing it's whole thought process before the answer.
我仍然认为,大多数情况下Deepseek给出的答案也更好,但这可能只是我的偏见 —— 因为在看到答案之前,我能看到它完整的思考过程。
paradawx The latest upxe to ChatGPT 5 blocks a lot of functionality that's considered malicious. For example, I had a practice exam I needed help understanding a question and it refused to do it because it thought it was a live exam or changing Windows settings through powershell to disable Smartscreen that was bugging out. Whereas DeepSeek just gives you the answer.
ChatGPT 5 的最新更新屏蔽了许多被判定为恶意的功能。比如,我有一份练习题需要帮忙理解其中一道题目,它却拒绝提供帮助,因为它认为这是一场实时考试;此外,当我想通过 PowerShell 修改 Windows 设置以关闭出现故障的 SmartScreen 时,它也拒绝了。而DeepSeek则会直接给你答案。
Swimming_Goose_7555 It blows chat GPT out of the water with most things I give it
在我给它的大多数任务里,它的表现都远超 ChatGPT。
LegnaOnFire There is also another AI out there has been generating better images than ChatGPT and gemini, and doesn't lock you out after 3 images. It is crazy how China is apparently spending less money to produce Ai yet obtaining better results than ChatGPT.
目前还有另一款AI工具,生成的图像比 ChatGPT 和 Gemini 更出色,而且不会在生成 3 张图像后就限制你使用。 中国在人工智能研发上的投入显然更少,却能取得比 ChatGPT 更优的成果,这着实令人惊讶。
mal73 I don’t believe them when they say they spend less money but there is no question that the Chinese talent and speed in the AI-space is incredible.
他们说投入的资金更少,我是不信的,但毫无疑问,中国在人工智能领域的人才实力和发展速度确实令人惊叹。
ILorwyn Wanna mention which one that is?
能说说具体是哪一个AI工具吗?
EstebanMolinos Nano banana if I had to guess
硬猜的话,我觉得是Nano banana(译者注:这是谷歌发布的新一代AI图像生成与编辑模型,Gemini 2.5 Flash Image代号“Nano Banana”))。
lidekwhatname they said better than gemini, i would assume nano banana falls under the gemini umbrella
他们都说比 Gemini 更好,我觉得 Nano banana可能也属于 Gemini 体系下的产品。
Llamasarecoolyay This is just straight up false.
这完全是假的。
KingofRheinwg Check out kimi k2, I used to use deepseek but k2 is definitely a step up for my uses.
可以了解一下 Kimi K2,我以前用的是DeepSeek,但对我个人的使用需求来说,K2 无疑更胜一筹。
countsmarpula It’s as if the tech companies in the US are laundering money ¿
美国的科技公司看起来就像是在洗钱,不是吗?
binaryghost01 I'd say its more probable that they are just incompetent
我觉得更有可能的情况是,他们只是能力不足而已。
pimpeachment Hanlon razer
汉隆剃刀原则(译者注:这个有点超出知识水平,汉隆剃刀定律是一种哲学观念,或者是思维模型,其核心立意是“能解释为愚蠢的,就不要解释为恶意的。”)
King_of_the_Nerds I believe Hanlon’s Razor works better on the small scale attributed to individuals or small groups of people. Is there a razor that states that corporations will cut corners and fuck over everyone ever in search of an extra nickel? Stockholder’s Razor? CEO’s Razor? Late Stage Capitalism’s Razor?
我认为汉隆剃刀更适用于个人或小规模群体层面的情况。有没有这样一条 “剃刀原则”—— 认为企业为了多赚哪怕一分钱,总会走捷径,并不惜损害所有人的利益? 股东剃刀原则? CEO 剃刀原则? 晚期资本主义剃刀原则?(译者注:此处就不过多解释,因为我也不懂,有兴趣的朋友可以自己去搜搜)
countsmarpula I’d say it’s more probable that there is massive corruption. Not exactly a theory.
我认为更有可能存在大规模腐败。这算不上是什么理论。
binaryghost01 I mean, the corruption would be to get more money and do bigger things... But they would still be incompetent and unsustainable because they can't make use of few resources in order to do big things. In the long term, Deepseek is like a whale next to fishes because they can provide the same or better value while spending much less resources. Also, although the AI industry is very hyped and ought to bring us many interesting things, data estimates it wont be as big as the Carbon market.
我的意思是,腐败的目的或许是为了获取更多资金、做成更大的事…… 但他们依然会能力不足且难以持续,因为他们无法利用有限的资源去达成宏大目标。 长期来看,Deepseek就像鱼类中的鲸鱼 —— 它能提供同等甚至更优的价值,同时消耗的资源却少得多。 此外,尽管人工智能行业炒得火热,也理应给我们带来许多有趣的成果,但数据显示,它的规模最终不会有碳市场那么大。
refboy4 There is a lot of fairly convincing evidence that a vast majority of the code base behind DeepSeek was copied/ stolen from OpenAI. It’s a whole lot cheaper to develop software when you don’t really have to develop software, just wait for someone else to do it/ fund it.
有大量相当有说服力的证据表明,DeepSeek背后的绝大部分代码库是从 OpenAI 复制 / 窃取而来的。 当你实际上无需真正开发软件,只需等待他人去开发 / 为其提供资金时,开发软件的成本会低得多。
bjran8888 That's hilarious. So why does Deepseek require so much less computational power than ChatGPT? After the open-source Deepseek emerged, all American models started copying its technology. ChatGPT isn't open-source—who the hell can verify whether Sam Altman's claims are true? This company should rename itself CloseAI.
这太好笑了。那为什么Deepseek需要的算力比 ChatGPT 少这么多呢? 自从开源的Deepseek问世后,所有美国的模型都开始抄袭它的技术了。 ChatGPT 又不是开源的 —— 谁能去核实山姆・奥特曼说的话是真是假啊?这家公司怕不是该改名叫 “封闭人工智能”(CloseAI)才对。
david1610 This is incorrect, it's the stylistic output of the model's writing, not the codebase. The analysis would need both open AIs codebase and deepseeks. So not possible. Deepseeks released the trained model weights, so we know it wasn't stolen without significant modification from Open AI, otherwise the they'd say so. What is more likely is that deepseek trained their model partially on output from Open AI. It's what I'd do as a final step. I mean it's exactly what these companies did to publications online and authors so I don't really feel sorry for them.
这种说法是不正确的,这是模型输出文本的风格特点,而非代码库的问题。 要进行这类分析,需要同时获取 OpenAI 和Deepseek双方的代码库,因此是不可能实现的。 Deepseek已经公开了其训练后的模型权重,由此可知,它并非未经大幅修改就从 OpenAI 窃取而来;否则,OpenAI 方面早就会发声了。 更有可能的情况是,Deepseek在训练其模型时,部分使用了 OpenAI 的模型输出内容。 如果是我,也会把这一步作为最后环节。毕竟,这些公司之前就是这么对待网络上的出版物和作者的,所以我实在没法对它们产生同情。
countsmarpula Wow, I’m not sure I knew that.
哇,我之前好像还真不知道这事。
refboy4 Nothing has been “proven”, but there are some US engineers that pulled apart their code and a huge percentage was identical to stuff from OpenAI, except they stripped down/out some of the more complex stuff. That’s why it “runs faster”
目前没有任何事情被 “证实”,但有一些美国工程师拆解过DeepSeek的代码,发现其中很大一部分内容与 OpenAI 的代码完全相同,只是他们删减了部分更复杂的模块。这就是它 “运行速度更快” 的原因。
Familiar_Resolve3060 Is that a doubt?
这是在质疑吗?
BidenGlazer Yes, certainly it's more likely that there's a massive conspiracy theory going on in the US rather than Deepseek lying.
是的,显然更有可能的情况是,美国正在上演一场大规模的阴谋论,而非Deepseek在撒谎。
countsmarpula Is this sarcasm? I can’t tell.
这是在讽刺吗?我分辨不出来。
IRequirePants Deepseek uses ChatGPT for distillation
Deepseek使用 ChatGPT 进行模型提炼。
Zealousideal_Low1287 Is this supposed to include pre training, everything? Or just the reinforcement learning for the conversational agent?
这是否应该包含预训练等所有环节?还是只包含对话智能体的强化学习部分?
simulated-souls This seems to only include RL post-training. Pre-training was said to cost 5M dollars, which was still way cheaper than other comparable models at the time.
这似乎只包含强化学习后期训练阶段。 据称,其预训练成本为 500 万美元,即便如此,这在当时仍远低于其他同类模型的成本。
Splurch Based on last time they made these claims it’s based on whatever numbers they feel like including instead of reality.
根据他们上次提出这些说法的情况来看,其依据不过是他们随意选择纳入的数字,而非客观事实。
Throwaway__shmoe It doesn’t include a lot of things, most importantly the bridge they’re trying to sell dumbasses in the west.
这其中没有包含很多东西,最重要的是那座他们正试图推销给西方蠢货的桥梁。
AssimilateThis_ Training cost is not the problem in the long run, it's inference cost.
从长远来看,训练成本并非问题所在,真正的问题是推理成本。
Legionof1 Is that just the cost to train on existing hardware or is it the cost of all the hardware too?
这指的仅仅是在现有硬件上进行训练的成本,还是也包含了所有硬件本身的成本?
dftba-ftw The article is basically that additional training information has refunded the estimate. This is the additional info: "Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip-cluster of H800 chips, they added." So yes, they just estimated the cost to run 512 H800 for 80 hours. It doesn't include purchasing the chips or the payroll or anything of that nature.
这篇文章的核心内容是,新增的训练相关信息推翻了之前的成本估算。 以下是新增信息: “在开展 DeepSeek-R1 的相关研究时,我们先使用 A100 GPU 为小型模型的实验做准备,” 研究人员写道。他们补充道:“完成这一初始阶段后,我们在由 512 块 H800 芯片组成的集群上,对 R1 模型进行了总计 80 小时的训练。” 所以情况很明确,他们只是估算了 512 块 H800 芯片运行 80 小时所需的成本,并未包含芯片采购费用、人员薪资或其他此类性质的支出。
Unfair_Cicada Is China electric powered by cheap renewable clean energy cheaper than our fossils fuels? I heard AI is all about energy efficiency. I don’t know much. Please enlighten me.
中国以廉价可再生清洁能源发电,其成本是否比我们使用化石燃料发电更低?我听说人工智能的核心在于能源效率。我对此了解不多,还请赐教。
theassassintherapist Yes. 3 gorges dam, tons of wind turbines, thorium nuclear plants, and solar panels. And as of last year, coal generation only made up 53% of all power in China and declining.
是的。中国有三峡大坝、大量风力发电机、钍核电站,以及太阳能电池板。而且截至去年,煤炭发电量在中国总发电量中的占比已降至 53%,且这一比例仍在持续下降。
Unfair_Cicada Wow! I didn’t even know coal is still Being used today. I read somewhere that all We need is build a solar farm the size of New York City and we would have solve our energy need. If it’s so simple and cheap why isn’t jt being done? I think I am just mistaken or something. Please enlighten me.
哇!我都不知道现在还在使用煤炭发电。我之前在哪看到过,说我们只需要建造一个纽约市那么大的太阳能发电场,就能满足所有能源需求了。要是这事儿真这么简单又便宜,为什么没人去做呢?我觉得可能是我搞错了或者别的什么原因,还请赐教。
Mentallox China is a ginourmous energy sink and they are building all types of energy including additional coal plants. China doesn't have alot of natural gas sources in the country but they have coal so coal plants, nuclear, hydro, solar are all being built to meet their needs. The fact they are the world's top renewal energy producer is true, they have more solar than the rest of the world together but its also true with their coal.
中国是一个巨大的能源消耗国,目前正在建设各类能源项目,其中也包括新增的燃煤电厂。中国国内的天然气资源储量并不丰富,但拥有大量煤炭资源,因此燃煤电厂、核电站、水电站以及太阳能电站都在同步建设,以满足其能源需求。中国是全球最大的可再生能源生产国,这一说法属实 —— 其太阳能装机容量超过了世界其他国家的总和;但与此同时,中国的煤炭产能同样位居世界前列,这一点也符合事实。
theassassintherapist Wow! I didn’t even know coal is still Being used today ??? What do you mean? Even US is still using coal. If it’s so simple and cheap why isn’t jt being done? And it is being done. Which is why their clean energy generation is rising every year, as shown in that article I posted. I have doubts that NYC-sized solar farm is enough to power all of China, but even then they still need to allocate and buy lands to build solar panels. Having all that in a single area the size of NYC is beyond stupid since you'll lose so much energy trying to transmit that thousands of miles to every part of that country.
哇!我都不知道现在还在使用煤炭发电。 ???你什么意思?就连美国现在也还在使用煤炭发电啊。 要是这事儿真这么简单又便宜,为什么没人去做呢? 而且这事儿确实有人在做。我之前发的那篇文章里就提到了,这也是为什么他们的清洁能源发电量每年都在增长。我怀疑建一个纽约市那么大的太阳能发电场,根本不足以满足中国全国的用电需求;但即便足够,他们也得划拨和购买土地来安装太阳能电池板。把所有太阳能设施都集中在一个纽约市大小的区域里,这简直蠢透了 —— 因为要把电能输送到数千英里外的全国各地,途中会损耗大量能源。
Unfair_Cicada I was referring to USA powering our country using cheap clean energy so we can compete with China mega power projects.
我指的是,美国应该用廉价的清洁能源为本国供电,这样我们才能与中国的超级能源工程展开竞争。
theassassintherapist yeah...that's not going to happen with this administration.
是啊…… 在这届政府任内,这事儿是不可能发生的。
The_Billy My current understanding is that while solar is very cheap to build, it's not as profitable to build compared to a coal plant. This is because energy pricing is based on how much is available. When it's sunny (or windy) all the solar panels/wind turbines generate electricity and drive the cost of energy down. The more you build, the cheaper the energy is. As a result, you still have developers investing in coal. I think in order to get solar to be more widespread in the US we'd one of the following scenarios: More research into cheap sustainable energy storage, followed by building that out. Reduce the regulatory burden on things like balcony or rooftop solar Stopping subsidies to natural gas, oil, and coal combined with subsidizing solar I'm not an expert, but the way the internet works someone will come correct me if I'm wildly off base here
我目前的理解是,虽然太阳能发电设施的建设成本很低,但与燃煤电厂相比,其建设的盈利性并不高。这是因为能源定价取决于能源的可用供应量:当阳光充足(或风力充足)时,所有太阳能电池板(或风力发电机)都会同时发电,进而拉低能源价格。而且你建设的太阳能(或风能)设施越多,能源价格就会越低。 正因如此,仍有开发者会投资燃煤电厂。我认为,要想让太阳能在美国得到更广泛的应用,我们需要实现以下情形之一: 加大对低成本可持续能源存储技术的研究力度,随后大规模推广该技术。 减轻阳台太阳能、屋顶太阳能等小型太阳能设施在审批监管方面的负担。 停止对天然气、石油和煤炭行业的补贴,同时为太阳能行业提供补贴。 我并非这方面的专家,但互联网的运作方式就是如此 —— 如果我的观点存在严重偏差,肯定会有人来纠正我。
Unfair_Cicada For a country to thrive I think energy should be made as cheap and abundant as possible. Only with a solid foundation can we build awesome infrastructure. Energy for profit should be regulated.
我认为,一个国家要实现繁荣发展,就应尽可能让能源变得廉价且充足。唯有打下坚实的基础,我们才能建设完善优良的基础设施。对于以盈利为目的的能源产业,应当加以监管。
The_Billy While I agree with you in principle, the fact remains we live in a capitalist society where market forces somewhat dictate the outcome. I think if solar and wind become more profitable (not just more cheap) we'll see investors much more likely to take a chance.
虽然我原则上同意你的观点,但现实情况是,我们生活在一个资本主义社会,市场力量在一定程度上决定着最终结果。我认为,如果太阳能和风能能变得更有利可图(不只是成本更低),那么投资者大概率会更愿意尝试。
IRequirePants China builds a huge amount of coal plants. And also imports a huge amount of coal from the US
中国修建了大量燃煤电厂,同时也从美国进口大量煤炭。
Theappunderground China has one experimental thorium reactor, it doesnt make power for the grid. The united states uses coal to generate 15% of its total power.
中国拥有一座实验性钍反应堆,该反应堆并不向电网供电。美国的煤炭发电量占其总发电量的 15%。
theassassintherapist The united states uses coal to generate 15% of its total power. ...Plus 42% natural gas, which is a fancier way of saying burning methane, another greenhouse gas and fossil fuel. Let's not pretend natural gas is any cleaner: methane, CH₄, when burnt releases a lot of carbon dioxide into the atmosphere.
美国的煤炭发电量占其总发电量的 15%。 此外,天然气发电量占比为 42%—— 而 “天然气” 不过是 “燃烧甲烷” 的好听的叫法,甲烷本身既是另一种温室气体,也属于化石燃料。我们别再假装天然气更清洁了:甲烷(化学式 CH₄)燃烧时,会向大气中释放大量二氧化碳。
squarexu Yes China's electricity costs about 1/2 of the US electricity. Also, I have read that China's electricity capacity is more than US, Europe and India combined. So theortically, if China is designing chips, they use 2x to 3x the electricity as Nvidia chips and still be relatively the same cost if ran from China.
是的,中国的电价约为美国的一半。此外,我还了解到,中国的电力装机容量超过了美国、欧洲和印度的总和。因此,从理论上讲,如果在中国设计芯片,即便其耗电量是英伟达芯片的 2 到 3 倍,在中国制造和运行时,其成本仍可能与在其他地方使用英伟达芯片大致相当。
MarcoGWR Yeah, China's solar electric is cheaper than cola now.
是的,中国的太阳能电力现在比煤炭发电更便宜。(译者注:这里cola应该是评论者打错了,正确的是coal煤炭,下面有回复就是在调侃他的)
Unfair_Cicada How do you compare electricity to cola?
你怎么会把电力和可乐拿来比较呢?
Big-Chungus-12 I wish they could just work together, China and the US could make real change faster
我真希望中美两国能携手合作,这样就能更快地带来真正的改变。
OnlineParacosm Me when I’m 12
我12岁的时候也这么想的
Big-Chungus-12 Nothing wrong with blind hope, even with the crushing weight of reality
盲目乐观并非过错,即便要承受现实的沉重压力。
KR4T0S Probably out of the question for now, Trumps persecution of immigrants has already seen many of them leave the US. Trump seems to think those roles will be fulfilled by his voters soon.
目前这事儿恐怕不太可能。特朗普对移民的打压已导致许多移民离开美国,而他似乎认为,这些空缺的岗位很快会由他的支持者来填补。
theassassintherapist Further back than this. The republican's Wolf Amendment back in 2011 was the reason why there's two space stations in orbit right now instead of both nations helping to improve the ISS.
比这更早的时候,2011 年共和党提出的《沃尔夫修正案》才是关键原因 —— 正是因为它,如今轨道上才会存在两座空间站,而非美中两国携手共同改进国际空间站。
Big-Chungus-12 I don’t think we’d ever see an administration that would ally ourselves with China except maybe if Newsome wins
我认为,我们恐怕永远不会看到有哪届美国政府会与中国结盟,除非或许纽森(加州州长)能赢得大选
refboy4 China spoiled that relationship by spending decades stealing all our IP, directly and repeatedly trying to breach security on critical infrastructure and government systems, etc…
中国通过数十年来Qie取我们的所有知识产权,直接且反复试图破坏关键基础设施和政府系统的安全性等行为,破坏了这种关系。
HedgeMoney But a boy can dream.
但男孩总有梦想的权利
MarcoGWR Jesus, if you do believe in those shit fake news, you can never understand why China can rise.
天啊,要是你真信那些狗屁假新闻,你永远也搞不懂中国为什么能崛起。
refboy4 Dafuq you talking about?
你在胡说八道什么?
duncandun Only Americas allowed to do that!
只有美国人才能这么做!
BanditoBoom This doesn’t include all the hardware. Complete BS
这还不包括所有的硬件设备。纯属胡说八道。
imkindathere Why would it? The hardware is reusable
为什么要包括呢?这些硬件是可重复使用的
simulated-souls That does include the hardware. Most teams rent GPUs instead of buying them, so the rental costs are what matter. GPUs go for about 3 dollars an hour in America, so if you use their hardware numbers (512 GPUs x 80 hours), you get 122K, which is actually lower than the number they gave.
这其中是包含硬件的。大多数团队会租用GPU而非购买,因此租赁成本才是关键。 在美国,GPU的租赁费用约为每小时 3 美元。所以,若按他们给出的硬件数量(512 个GPU × 80 小时)计算,得出的费用是 12.2 万美元,这一数字实际上低于他们所提供的数额。
Familiar_Resolve3060 OpenAI bot dying today
OpenAI 的机器人今天崩了
EC36339 It's a small language model derived from an existing large language model. Of course it is cheap. SMLs have their use. For example, you can run them on a client and don't need a server. But all this hype about how cheap DeepSeek was to train and how they "stole" all the data are the usual nothing burgers for the uneducated masses. (I'm not even "educated" about LLMs and AI. I was just curious enough to want to know what DeepSek actually is. Most people who have opinions don't even care about how clueless they are and how much more informed they could be after only 10 minutes of googling)
这是一个基于现有大型语言模型衍生而来的小型语言模型,成本低当然是理所当然的。 小型语言模型(SMLs)有其适用场景。比如,你可以在客户端运行它们,无需依赖服务器。但所有这些关于DeepSeek训练成本有多低、以及他们如何 “窃取” 所有数据的炒作,对那些缺乏相关知识的人来说,不过是常见的无稽之谈罢了。 (其实我本人对大型语言模型(LLMs)和人工智能也算不上 “懂行”,我只是出于好奇,想弄清楚DeepSeek究竟是什么。但大多数发表看法的人,根本不在意自己有多无知 —— 他们甚至不愿花 10 分钟谷歌搜索一下,让自己多了解一点相关信息。)
simulated-souls It is not a small language model. In fact it is one of the biggest at 685 billion parameters. It is cheap because they used a mixture of experts architecture, which only activates a fraction of the total parameters for every query.
它并非小型语言模型。事实上,凭借 6850 亿个参数,它是目前规模最大的语言模型之一。 它成本较低的原因在于采用了混合专家架构(Mixture of Experts, MoE)—— 该架构在处理每个查询时,仅会激活总参数中的一部分。
Tuubular Crazy how far down this comment is considering it’s just a logical explanation and not some baseless accusation of us tech companies being inefficient and money launderers
这条评论的排名居然这么靠后,实在是离谱 —— 要知道它只是在进行理性解释,并没有像某些言论那样毫无根据地指责美国科技公司效率低下、涉嫌洗钱。
zeelbeno "R1 was trained for a total of 80 hours on the 512 chip-cluster of H800 chips" Yeah... it's not really doing any training then. It basically just took all the other AIs and worked backwards. It's not doing anything to push forward AI.
“R1 模型在由 512 块 H800 芯片组成的芯片集群上,总计训练了 80 小时。” 是啊…… 这么看来,它其实算不上真正在做训练。 本质上,它不过是拿了其他所有人工智能模型的数据,反向推导罢了。 这种做法对推动人工智能发展毫无帮助。
PooInTheStreet Oh wow China makes another claim!
哇哦,中国又提出了一项主张!
Patriark Theft is cheap
“偷窃” 的成本很低。
Guinness Bullllllshiiiiiiit!
一派胡言!
Familiar_Resolve3060 Just like you
就像你一样。
finallytisdone Everything DeepSeek says is at best obfuscation. The fact that the media picked up on their original, obviously false statements was wild. They are using a combination of very creative accounting and outright lies.
DeepSeek的所有言论,往好里说都是在混淆视听。媒体居然会采信他们最初那些明显不实的说法,实在令人费解。他们无非是把极具误导性的财务计算和彻头彻尾的谎言混在一起用罢了。
squarexu This was submitted to Nature.
该研究成果已投稿至《自然》杂志上了
swagdu69eme I don't believe them at all
我完全不相信他们。
bhaaad you can tell any numbers you want when you strait up lying :D
你要是直接撒谎,想报什么数字就能报什么数字(笑)
thewackytechie BS. A ton of things left out.
胡说八道。好多东西都没提。
yehiko Honestly, I was using it for qwen at start, but for anything other than basic stuff its really shit compared to chatgpt. Like not even comparable for anything related to coding. Im not a coder, but sometimes I need a snippet for personal stuff and chatgpt is so good at it and qwen/deepseek go around spewing shit all day
说实话,我一开始是用Qwen的,但除了处理一些基础内容,它跟 ChatGPT 比起来是真的差。比如在编程相关的内容上,两者根本没有可比性。我不是程序员,但有时候会需要一段代码片段来处理个人事务,而 ChatGPT 在这方面做得特别好,反观Qwen和DeepSeek,整天都在胡言乱语。
RogueHeroAkatsuki I cheer on them hard because we need competition and it would be amazing if they can stay competitive against big tech sleeping on piles of cash. However lets be honest. if OpenAI or Google are spending hundreds of billions USD on AI then its impossible to keep with them with only 300k. Its just too good to be true. Without doubt already someone would think about it as it means billions more in dividends!
我真心为他们加油,因为我们需要竞争 —— 如果他们能顶住那些财大气粗、坐拥巨额资金的科技巨头,保持竞争力,那将会非常了不起。但说实话,倘若 OpenAI 或谷歌在人工智能领域投入了数千亿美元,仅凭 30 万美元是绝无可能跟上他们步伐的。这实在好得令人难以置信。毫无疑问,早就有人会打这个主意了,因为这意味着能多赚数十亿美元的股息!
lolwut778 Keep in mind that this figure probably doesn't include hardware needed, but still good news for LLM development.
要注意的是,这个数字很可能没有包含所需的硬件,但对大型语言模型的发展而言,这仍然是个好消息。
Familiar_Resolve3060 But when you consider OpenAI hardware laundering its not even a nick
但要是考虑到 OpenAI 的硬件洗钱行为,这甚至都不值一提。