DeepSeek V4 Pro matched GPT-5.2 on an Agent benchmark in just 10 weeks. The frontier gap between US and Chinese LLMs is shrinking from "measured in years" to "measured in weeks."
What this is
Overseas testing team FoodTruck Bench released its latest evaluation results. This is a 30-day Agent (AI capable of autonomously calling tools to complete complex tasks) benchmark, requiring models to simulate running a food truck using 34 tools. It covers pricing, inventory, scheduling, and weather response, testing AI's memory and continuous decision-making capabilities.
DeepSeek V4 Pro ranked fourth, with a median gap of less than 3% compared to GPT-5.2, making it the first Chinese model to enter the test's frontier camp. More importantly, the cost: for equivalent tasks, DeepSeek's API cost is only about one-seventeenth of GPT-5.2's. Compared to the similarly-priced Grok 4.3, DeepSeek wins on stability—6x less food waste and 30% more daily meals served. Additionally, Xiaomi's MiMo v2.5 Pro surged to sixth place. For the first time, two Chinese models appeared in the top six, both priced under $3.50.
Industry view
We note that the gap between US and Chinese frontier models has been drastically compressed. The industry widely believed Chinese models were a year behind the US, but this time difference has now shrunk to about ten weeks. Meanwhile, the stability of Chinese teams in RAG (Retrieval-Augmented Generation, technology enabling AI to call external knowledge bases) and tool invocation is turning "cost-effectiveness" into a core moat.
However, dissenting voices are worth heeding: the closed environment of a benchmark cannot equate to real commercial scenarios. Real business involves complex compliance and privacy; an AI achieving "zero loans" in a test doesn't mean it won't make errors in an enterprise ERP system. Furthermore, if DeepSeek's habitual low-pricing strategy becomes long-term, it could backfire on the entire industry's profit margins, stripping mid-to-small model companies of their survival soil and ultimately harming ecological diversity.
Impact on regular people
For enterprise IT: The trial-and-error cost of deploying Agents has dropped significantly. A budget previously only enough for one GPT project can now run over a dozen DeepSeek projects, visibly improving ROI expectations for enterprise digital transformation.
For the individual workplace: AI's ability to execute complex operational tasks is getting stronger and cheaper. Managers' focus must accelerate from "teaching AI how to do things" to "judging whether AI did it right."
For the consumer market: Smartphone makers (like Xiaomi) having their self-developed LLMs rank among the top means future local AI assistants on smart devices will be faster, smarter, and free, potentially substantially changing ordinary people's device interaction habits.