Article Not Found

DeepSeek V4 Pro Matches GPT-5.2: US-China AI Gap Shrinks to Ten Weeks

DeepSeek V4 Pro matched GPT-5.2 on an Agent benchmark in just 10 weeks. The frontier gap between US and Chinese LLMs is shrinking from "measured in years" to "measured in weeks."

What this is

Overseas testing team FoodTruck Bench released its latest evaluation results. This is a 30-day Agent (AI capable of autonomously calling tools to complete complex tasks) benchmark, requiring models to simulate running a food truck using 34 tools. It covers pricing, inventory, scheduling, and weather response, testing AI's memory and continuous decision-making capabilities.

DeepSeek V4 Pro ranked fourth, with a median gap of less than 3% compared to GPT-5.2, making it the first Chinese model to enter the test's frontier camp. More importantly, the cost: for equivalent tasks, DeepSeek's API cost is only about one-seventeenth of GPT-5.2's. Compared to the similarly-priced Grok 4.3, DeepSeek wins on stability—6x less food waste and 30% more daily meals served. Additionally, Xiaomi's MiMo v2.5 Pro surged to sixth place. For the first time, two Chinese models appeared in the top six, both priced under $3.50.

Industry view

We note that the gap between US and Chinese frontier models has been drastically compressed. The industry widely believed Chinese models were a year behind the US, but this time difference has now shrunk to about ten weeks. Meanwhile, the stability of Chinese teams in RAG (Retrieval-Augmented Generation, technology enabling AI to call external knowledge bases) and tool invocation is turning "cost-effectiveness" into a core moat.

However, dissenting voices are worth heeding: the closed environment of a benchmark cannot equate to real commercial scenarios. Real business involves complex compliance and privacy; an AI achieving "zero loans" in a test doesn't mean it won't make errors in an enterprise ERP system. Furthermore, if DeepSeek's habitual low-pricing strategy becomes long-term, it could backfire on the entire industry's profit margins, stripping mid-to-small model companies of their survival soil and ultimately harming ecological diversity.

Impact on regular people

For enterprise IT: The trial-and-error cost of deploying Agents has dropped significantly. A budget previously only enough for one GPT project can now run over a dozen DeepSeek projects, visibly improving ROI expectations for enterprise digital transformation.

For the individual workplace: AI's ability to execute complex operational tasks is getting stronger and cheaper. Managers' focus must accelerate from "teaching AI how to do things" to "judging whether AI did it right."

For the consumer market: Smartphone makers (like Xiaomi) having their self-developed LLMs rank among the top means future local AI assistants on smart devices will be faster, smarter, and free, potentially substantially changing ordinary people's device interaction habits.

DeepSeek V4 Pro Matches GPT-5.2: US-China AI Gap Shrinks to Ten Weeks

What this is

Industry view

Impact on regular people

相关推荐

亚马逊给 Bedrock Agent 加上网页搜索，卖点不只是“更聪明”而是更好交付

1000 万文档向量可从 31GB 压到 4GB，RAG 成本开始回到工程优化

新 Agent 基准把“会不会干活”单独拎出来，Claude 与 GLM 暂时跑在前面

研究型 AI 助手开始学会“偷看”无关内容，企业落地先别急着放权

一篇 RAG 面试题为何火了：企业补知识的需求，正在压过大模型参数竞赛

AWS 推出 Context 服务，把企业数据关系织成图谱，Agent 落地开始拼治理而非模型