中国AI领域最热门新商品:词元
The rise of China’s hottest new commodity: AI tokens
译文简介
由深度求索和MiniMax等企业研发的中国AI模型,在词元的使用量上已超越美国竞争对手。
正文翻译

题图:人工智能代理消耗的词元数量远超早期的聊天机器人,因此中国低成本生产词元的能力使其获得了新的竞争优势。
China is gaining ground in the global AI industry’s hottest commodity: tokens.
中国正在全球人工智能产业最热门的商品——词元领域取得进展。
Since February, Chinese AI models made by groups such as DeepSeek and MiniMax have overtaken US rivals in token consumption, according to OpenRouter data, which tracks these units of text, code or data processed by large language models.
据追踪大语言模型处理文本、代码或数据单元的OpenRouter平台数据显示,自二月起,由深度求索、MiniMax等企业研发的中国人工智能模型在词元处理量上已超越美国同类产品。
The shift points to a deeper change in the AI race, with Nvidia’s Jensen Huang saying this month that the production and use of the digital units will drive the AI economy. Because developers are charged per token, it doubles as both a proxy for adoption of models and a pricing battleground between AI companies.
这一转变标志着人工智能竞赛正发生更深层次的变化。英伟达首席执行官黄仁勋本月表示,词元的生产和使用将推动人工智能经济发展。由于开发者按词元数量付费,因此这既可作为模型采用率的衡量指标,也成为人工智能企业之间的定价战场。
As AI agents, such as those built on the open-source platform OpenClaw, consume vastly more tokens than earlier chatbots, the ability to cheaply produce tokens is reshaping global competition — and giving China a new edge.
随着基于开源平台OpenClaw等AI智能体消耗的词元数量远超早期聊天机器人,低成本生成词元的能力正在重塑全球竞争格局——这也为中国带来了新的竞争优势。
“If your agent is burning through millions of tokens a day, even a small per-token price difference becomes a significant line item,” said Will Liang, chief executive of Amplify AI Group, a Sydney-based technology consulting firm. “That’s a structural tailwind for Chinese labs, and it only grows as agentic adoption scales.”
“如果你的智能体每天消耗数百万个词元,即使每个词元的价格差异很小,也会成为一项重要的开支项目,”总部位于悉尼的技术咨询公司Amplify AI集团的首席执行官梁威尔表示。“这对中国实验室来说是一个结构性利好,而且随着智能体应用的规模化,这种优势只会增长。”
Chinese AI groups’ cost advantage stems from cheaper energy and more efficient models, allowing companies such as MiniMax and Moonshot to charge $2 to $3 per million output tokens, compared with about $15 for Anthropic’s Claude Sonnet 4.5 — a near sixfold gap.
中国AI企业的成本优势源于更便宜的能源和更高效的模型,这使得像MiniMax和月之暗面这样的公司能够对每百万个输出词元收取2至3美元的费用,而Anthropic的Claude Sonnet 4.5的收费约为15美元——差距接近六倍。

The difference becomes pronounced with AI agents, which consume far more tokens than chatbots. Summarising Shakespeare’s Hamlet might take about 30,000 tokens for a chatbot, but an AI agent can require up to 20mn on a minor coding task.
这种差异在使用AI智能体时变得尤为明显,因为智能体消耗的词元数量远超聊天机器人。总结莎士比亚的《哈姆雷特》对于聊天机器人可能只需要大约3万个词元,但一个AI智能体完成一个简单的编码任务就可能需要多达2000万个词元。
That is changing how AI developers choose to spend their money. Terry Zhang, a Hong Kong-based developer, said he now spends about $50 a day using Moonshot’s Kimi model for roughly 80 per cent of his work, reserving Anthropic’s Claude for more complex tasks.
这正在改变AI开发者的资金分配方式。常驻香港(特区)的开发者张特瑞表示,他现在每天花费约50美元使用月之暗面的Kimi模型来完成大约80%的工作,而将Anthropic的Claude留作处理更复杂任务。
“I used to call only Claude but now with an increasing amount of workload, using just Claude would cost me about $900 a day,” he said. “It’s too much and the mixed use of Kimi and Claude works well for me.”
“我以前只用Claude,但现在工作量越来越大,如果只用Claude,我每天要花费大约900美元,”他说。“这太多了,混合使用Kimi和Claude对我来说效果很好。”
The trend is feeding through to revenues. MiniMax, whose M2.5 model is now ranked among the most used globally by token consumption, has seen token usage rise 476 per cent from a month ago as of March 20, according to OpenRouter.
这一趋势已开始体现在营收上。根据OpenRouter的数据,截至3月20日,其M2.5模型在全球词元使用量排名中位居前列的MiniMax,其词元使用量较一个月前增长了476%。
While OpenRouter accounts for only a fraction of the global model consumption, it is widely used as an industry indicator, as such data is scarce elsewhere.
尽管OpenRouter仅占全球模型消费量的一小部分,但由于此类数据在其他渠道十分稀缺,该平台被广泛用作行业风向标。

US groups are still growing rapidly as the overall market expands, with OpenAI, Anthropic and Google all reporting strong revenue growth and adoption. But lower-cost Chinese models have obtained an opening to gain ground among users around the world.
随着整体市场扩张,美国企业集团仍在快速增长,OpenAI、Anthropic和谷歌均报告了强劲的收入增长和用户采用率。但成本更低的中国模型已获得突破口,在全球用户中取得进展。
China’s token pricing advantage stems partly from the country’s vast investment in renewable energy. The Chinese government this month designated “computing-electricity synergy” a national priority in its 2026 work report, explicitly lixing energy policy with AI competitiveness.
中国在词元定价上的优势部分源于该国对可再生能源的巨大投资。中国政府在本月的2026年工作报告中将“算电协同”列为国家优先事项,明确将能源政策与人工智能竞争力挂钩。
On the software side, Chinese groups have embraced efficient AI architectures, such as a “mixture-of-experts” designs that reduce computational demand, sometimes at the expense of accuracy. This push for computing efficiency has been driven by a shortage of advanced chips in China due to US export controls.
在软件方面,中国团队采用了高效的人工智能架构,例如“混合专家”设计,这种设计降低了计算需求,有时以牺牲准确性为代价。这种对计算效率的追求,是由美国出口管制导致中国先进芯片短缺所驱动的。
There are technical constraints. Zhipu AI’s GLM-5 model briefly topped OpenRouter charts in February before usage surged beyond its compute capacity, causing delays and service degradation.
技术限制依然存在。智谱AI的GLM-5模型曾在2月短暂登顶OpenRouter排行榜,随后因使用量激增超出其计算承载能力,导致服务延迟和性能下降。
The company, which had to apologise and raise prices, saw its shares drop 22 per cent on the day, erasing more than $10bn in market value.
该公司不得不公开致歉并上调价格,其股价在事发当日暴跌22%,市值蒸发逾100亿美元。
“The model’s capability matters, but stable compute and service are equally indispensable,” said one veteran developer at Google. Google’s Gemini 3 Flash is ranked second among the top five most-used models this month, trailing behind Minimax.
“模型能力固然重要,但稳定的计算资源和服务同样不可或缺,”谷歌一位资深开发者表示。谷歌的Gemini 3 Flash模型在本月使用量前五的模型中排名第二,仅次于MiniMax。
China’s tech giants have moved quickly to press their advantage. Earlier this month Alibaba announced the creation of Alibaba Token Hub, a new business group that will be led by chief executive Eddie Wu. The unit signals Alibaba’s view that token economics will define the next phase of AI competition.
中国科技巨头正迅速扩大这一优势。本月初,阿里巴巴宣布成立ATH事业群,该新业务集团将由首席执行官吴泳铭直接领导。这一组织架构调整彰显了阿里巴巴的判断:词元经济将定义人工智能竞争的下一个阶段。

“We are standing at the threshold of an AGI inflection point,” Wu wrote in an internal memo last week. “Billions of AI agents are poised to take on an ever-greater share of digital work, each powered by tokens generated by models, and these agents will increasingly become the primary interface between people and the digital world.”
“我们正站在通用人工智能的拐点门槛上,”吴泳铭在上周的内部备忘录中写道。“数以亿计的AI智能体即将承担越来越多的数字化工作,每个智能体都由模型生成的词元驱动,这些智能体将日益成为人与数字世界之间的主要交互入口。”
Whether China’s token advantage can persist remains unclear, especially as some companies remain wary of relying on models run on Chinese data centres.
中国的词元优势能否持续尚不明朗,尤其是一些公司仍对依赖运行于中国数据中心上的模型持谨慎态度。
“The geopolitical headwinds are significant, particularly for governments and regulated industries,” said Amplify’s Liang. “Regulators are asking harder questions about where data is processed and under whose jurisdiction it falls.”
“地缘政治逆风相当显著,对政府和受监管行业而言尤其如此,”Amplify的梁先生表示。“监管机构正对数据处理地点及其所属司法管辖权限提出更严苛的质询。”
评论翻译
很赞 ( 1 )
收藏
20mn tokens for a minor coding task is a poor take. That is a huge amount of context, RAG embedding and prompt/response history to eat up 20mn tokens for a ‘minor task’. Most, well crafted, minor tasks would take 250k-750k tokens.
一个简单的编码任务就消耗2000万个词元,这显然不合理。处理海量的上下文、RAG嵌入以及提示/响应历史,只为了要完成一个“简单任务”,就要消耗掉2000万个词元。而大多数精心设计的简单任务通常只需25万到75万个词元。
You can compare asking a question of a body of text (Shakespeare) to another (codebase) but the framing isn’t right here.
你可以将向某类文本(如莎士比亚作品)提问与向另一类文本(如代码库)提问进行比较,但这里的类比并不恰当。
@Macpak
I have been confused because the concept of tokens is getting something, such as buying services.
我一直感到困惑,因为“词元”这个概念常被用于购买服务等场景。
It is a unit of data, typically 4 characters in a word is one token.
它是一种数据单位,通常一个单词中的4个字符即为一个词元。
So for an AI activity, tokens can measure the amount of input and output.
因此,对于AI活动而言,词元可以衡量输入和输出的量。
It would be interesting to know how ‘tokens per task’ vary by AI provider.
了解不同AI提供商的“每项任务所需词元数”有何差异,会很有意思。
@Ex non-dom
Basically US AI is massively overvalued vs Chinese AI? Latter equity trades at a fraction of valuation with more upside it seems + China has surplus power 40% cheaper per kwh vs US
简单来说,美国AI估值是不是比中国AI虚高太多了?后者的股票估值只有前者的零头,上涨空间看起来更大,而且中国电力过剩,每度电比美国便宜40%。
@silly fella
I don't understand this article because it seems to suggest that there is a separate token market independent of the AI model that issues it.
我搞不懂这篇文章,因为它似乎在暗示存在一个独立于发行词元的AI模型之外的词元市场。
But there is no token market only an account that a user has with a particular model and the tokens vary in price from model to model. I'm from market to market so that tokens for Gemini in India are probably cheaper than tokens in New York
但根本没有什么词元市场,只有用户与特定模型绑定的账户,而且不同模型的词元价格各不相同。我在多个市场都有使用经验,所以Gemini在印度的词元可能比纽约的要便宜。
What am I missing here?
我到底漏掉了什么?
@Badger
Am I the only one who finished the article without really grasping what these tokens are?
难道只有我一个人看完文章后还是没搞懂这些词元到底是什么吗?
@GejbdoxirbqnKf
It is a unit of inference work for the model provider.
这是大模型提供商系统内部进行推理工作的一个单位。
@Leonidas27
Bad idea. The Chinese electricity grid is incredibly dirty — carbon intensity of 530gCO2e/kWh versus a global average of 370 (IEA, 2023.)
这主意可不妙。中国电网的碳排放高得吓人——碳强度达每千瓦时530克二氧化碳当量,而全球平均值才370克(国际能源署,2023年数据)。
If you rely on Chinese models, you’re going to get smacked on Scope 2 emissions.
要是依赖中国的模型,你在范围二排放(注:指企业因外购电力、蒸汽、供热和制冷等能源产生的间接温室气体排放)上绝对会栽跟头。
@silly fella
so the Chinese market is not that dirty according to your figures since this is less than twice as dirty as the global average to which it is contributing substantially
这么说来,按照你的数据,中国市场也没那么脏嘛,毕竟它对全球平均值贡献巨大,但污染程度还不到全球平均水平的两倍。
@MMza
We use a technique in the industry called semantic routing to reduce costs. We create a database of vector embeddings that represent say 100 exemplar questions that we might send to models, and which provider the request should be sent to. Then we generate a vector embedding per prompt right before we’re about to send it to an LLM provider, see which exemplar question it is closest to by comparing distance using a vector DB, and route to that provider. This is a very cheap way to dynamically pick providers based on semantic meaning rather than just static routes.
我们业内采用一种称为语义路由的技术来降低成本。具体做法是建立一个向量嵌入数据库,其中包含大约100个示例问题——这些问题代表我们可能发送给模型的任务类型,并标注好每个请求应该发送给哪个服务提供商。在实际向大语言模型提供商发送提示前,我们会实时生成该提示的向量嵌入,通过向量数据库计算距离来匹配最接近的示例问题,随后将请求路由至对应的提供商。这种方法能基于语义动态选择服务商,成本极低,远比静态路由灵活。
This ability to represent semantic meaning using vector embeddings, and measuring their distance in higher dimensional space to find semantic similarity is, in itself, a fascinating bit of tech that isn’t much discussed in the press. It reveals how models think and store ideas, which is independent of text and human languages. A concept in any language would exist in roughly the same location in latent space for a multilingual model. There is a hint of a universal language there. Which isn’t surprising considering that LLMs were born out of an attempt to make a better universal translator.
通过向量嵌入表征语义,并在高维空间中测量距离以发现语义相似性——这项技术本身就非常迷人,可惜媒体很少深入讨论。它揭示了模型如何思考和存储概念,这种机制独立于文本和人类语言而存在。对于多语言模型而言,任何语言中的同一概念在潜在空间中的位置都大致相同。这暗示着某种通用语言的存在。考虑到大语言模型最初就脱胎于打造更好通用翻译器的尝试,这个发现其实并不令人意外。
@Buzzh
How does one associate the best provider for a given exemplar question? Is is just a matter of sending the exemplar to all of the providers and then grading the response and tokens used?
如何为给定的示例问题匹配最佳服务提供商?是不是只需要把示例发给所有提供商,然后根据回复质量和所用词元数来打分就行?
@Irony26
I love that we're talking about Token Economics and it has nothing to do with crypto.
我很高兴我们现在在讨论词元经济学,而它与加密货币无关。
@Timochka
Now that's an idea! Storing prompt cache on a blockchain!
这主意真绝了!把提示词缓存存到区块链上!
Be right back - I've got VCs to meet... I'll work out whether it'd actually be any use for anything after we've sold to OpenAI...
先失陪一下——我得去和风投们碰个头……等卖给OpenAI之后,我再琢磨这玩意儿到底有啥实际用处……
@TheGruntingEchidna
Chinese models will be regulated out from being used in the US for security reasons. But it will for sure hinder progress as it already happened with EVs. Instead of boosting an electrification strategy, they are returning to burn gas. The backwardness impact everything electric, including batteries. This point is critical as drones are making almost everything else obsolete regarding “defense”.
出于安全原因,中国的模型将在美国受到监管限制而被禁用。但这无疑会阻碍技术进步,就像电动车领域已经发生的情况一样。他们非但没有推进电气化战略,反而在回归燃油车。这种倒退影响了所有电气化领域,包括电池技术。这一点至关重要,因为无人机正在让几乎所有其他“国防”相关技术显得过时。
@Scaled
This might not be an accurate representation, Claude almost certainly minimizes token use by running what it can locally. These other models are likely not doing the same local optimizations.
这可能不是一个准确的描述,Claude几乎肯定会在本地运行尽可能多的计算来最小化词元消耗。其他模型很可能没有做同样的本地优化。