Article Not Found

Local AI Gets Serious: Anubis-OSS Leaderboard Tracks 218 Models, 10 Apple Chips

The Anubis-OSS leaderboard updated its data this week: 371 benchmark submissions, 218 competing models, and 10 Apple chips on the board—the ecosystem for local deployment of open-source models has grown large enough to require a serious leaderboard to measure it.

What this is

Anubis-OSS is a community leaderboard focused on the local running capabilities of open-source large models (similar to smartphone benchmarking software, but testing AI models' actual performance on local hardware). Its core question is: without cloud computing power, what models can a local machine actually run, how fast, and how well?

These numbers are worth breaking down: 218 models means the open-source community, which two years ago was still debating whether LLaMA could be used, is now a crowded track requiring horizontal comparisons; 10 Apple chips indicates that the M-series chips (Apple's custom Mac processors, whose unified memory architecture gives them a natural advantage for running large models) are no longer geek experiments, but hardware options officially integrated into the benchmarking system; 371 submissions means the community isn't just putting their names on the board and leaving, but repeatedly tweaking parameters, swapping hardware, and pushing for higher scores.

Industry view

Optimists believe the emergence of such leaderboards is a sign of open-source models maturing. When users can compare "how many tokens/s my M2 Max gets running Qwen2-7B" in a single table, the decision-making cost for local deployment drops significantly. This is especially critical for enterprise intranet deployments and scenarios where data cannot leave the cloud.

But we also note two risks. First, leaderboards inherently encourage "benchmark-tuning optimization"; a model that benchmarks well isn't necessarily the most practical in real-world scenarios. Second, the mainstream models currently deployed locally are still concentrated in the 7B-14B parameter range, and the gap with GPT-4-level capabilities isn't something a leaderboard can bridge. As one community member bluntly put it: "The benchmarking ecosystem is inflating, but most people's actual need is still API calls, not building their own environments."

Impact on regular people

For enterprise IT: Local deployment of open-source models now has quantifiable selection criteria, allowing data-sensitive industries (finance, healthcare) to evaluate "no-cloud" solutions with greater confidence.

For the individual workplace: Apple chips being officially included in benchmarks means the MacBook Pro in your hands is transitioning from an "office tool" to an "AI workstation." Workers who understand local deployment will have more tool choices.

For the consumer market: The more bustling the leaderboard, the easier it is for local AI tools to break into the mainstream. But consumers should be wary: benchmarks do not equal user experience; don't be led astray by numbers.

Local AI Gets Serious: Anubis-OSS Leaderboard Tracks 218 Models, 10 Apple Chips

What this is

Industry view

Impact on regular people

相关推荐

开源模型排行榜收录 218 款模型、10 款 Apple 芯片 — 本地跑 AI 正在变成正经事

客户一眼看出内容全是 AI 写的？三个反直觉定律帮你找回溢价

Google 让 Gemma 4 生成速度翻倍 — 小模型带大模型跑的"投机解码"成主流

AWS 让 Agent 突破浏览器边界 — 能看不能动的系统弹窗终于能动

Heretic 1.3 让 AI 模型「去审核」可复现 — 开源社区用透明度反击黑盒化

大模型开始展示思考草稿 — 黑箱透明化正从卖点变成标配