What Happened

A developer on r/LocalLLaMA attempted to use Claude Opus 4 (via Anthropic's max plan) to play Elden Ring using Claude Code. The model successfully navigated the character creator but failed to exit the opening chapel — a task completed by millions of human players. The post directly challenges recent statements by Jensen Huang and Marc Andreessen that AGI has effectively been reached.

Why It Matters

This is a concrete, reproducible failure case that cuts through the marketing noise. AGI, by definition, requires general reasoning beyond training data. Elden Ring's opening area involves spatial reasoning, trial-and-error motor feedback loops, and adaptive problem-solving — none of which current LLMs handle reliably.

  • Claude Opus 4 is among the most capable commercially available models today
  • Failure on a task with no ambiguity (leave the room) exposes the gap between benchmark performance and real-world generalization
  • Indie devs building AI-powered products should calibrate expectations: current models excel at pattern-matching tasks, not novel physical or spatial reasoning

Asia-Pacific Angle

Chinese and Southeast Asian developers are under significant pressure from investors and clients who cite AGI headlines as justification for aggressive AI product timelines. This test case is useful evidence when pushing back on unrealistic scopes. Teams in Shenzhen, Singapore, and Jakarta building AI agents for logistics, gaming, or robotics should note: even frontier models like Claude Opus 4 cannot handle basic closed-loop sensorimotor tasks. Local alternatives like Qwen2.5 and DeepSeek-V3 face the same ceiling. Budget your architecture around what LLMs actually do well — text, code, retrieval — not AGI-level autonomy.

Action Item This Week

Pick one AI feature in your current roadmap that assumes autonomous reasoning beyond text. Run a 30-minute stress test with your actual model (Claude, GPT-4o, or a local Qwen deployment). Document where it fails. Use that failure log to reset scope with your team or client before you're two sprints deep into a dead end.