Article Not Found

Claude Opus 4 Fails Elden Ring: A Reality Check on AGI Claims

What Happened

A developer on r/LocalLLaMA attempted to use Claude Opus 4 (via Anthropic's max plan) to play Elden Ring using Claude Code. The model successfully navigated the character creator but failed to exit the opening chapel — a task completed by millions of human players. The post directly challenges recent statements by Jensen Huang and Marc Andreessen that AGI has effectively been reached.

Why It Matters

This is a concrete, reproducible failure case that cuts through the marketing noise. AGI, by definition, requires general reasoning beyond training data. Elden Ring's opening area involves spatial reasoning, trial-and-error motor feedback loops, and adaptive problem-solving — none of which current LLMs handle reliably.

Claude Opus 4 is among the most capable commercially available models today
Failure on a task with no ambiguity (leave the room) exposes the gap between benchmark performance and real-world generalization
Indie devs building AI-powered products should calibrate expectations: current models excel at pattern-matching tasks, not novel physical or spatial reasoning

Asia-Pacific Angle

Chinese and Southeast Asian developers are under significant pressure from investors and clients who cite AGI headlines as justification for aggressive AI product timelines. This test case is useful evidence when pushing back on unrealistic scopes. Teams in Shenzhen, Singapore, and Jakarta building AI agents for logistics, gaming, or robotics should note: even frontier models like Claude Opus 4 cannot handle basic closed-loop sensorimotor tasks. Local alternatives like Qwen2.5 and DeepSeek-V3 face the same ceiling. Budget your architecture around what LLMs actually do well — text, code, retrieval — not AGI-level autonomy.

Action Item This Week

Pick one AI feature in your current roadmap that assumes autonomous reasoning beyond text. Run a 30-minute stress test with your actual model (Claude, GPT-4o, or a local Qwen deployment). Document where it fails. Use that failure log to reset scope with your team or client before you're two sprints deep into a dead end.

Claude Opus 4 Fails Elden Ring: A Reality Check on AGI Claims

What Happened

Why It Matters

Asia-Pacific Angle

Action Item This Week

相关推荐

llama.cpp 把网页界面做成可安装应用，本地大模型离日常使用又近了一步

EAGLE3 并入 llama.cpp，开源大模型推理开始更务实地追求提速

两天跑掉 5000 万 token 省下 151 美元，本地模型开始适合重度开发者

Anthropic 推出 Claude Fable，但更强不等于更可用，安全阉割成了核心卖点

Anthropic 提出给更强 AI 设“暂停键”，这不是保守，而是在抢规则制定权

Quasar-Preview 打出 500 万上下文，大模型竞争开始从会答题转向会读长材料