What Happened
A developer on r/LocalLLaMA successfully replaced Claude Code's cloud backend with a fully local llama.cpp server running Qwen3.5 27B (unsloth/UD-Q4_K_XL quantization). By redirecting ANTHROPIC_BASE_URL to http://127.0.0.1:8001 and disabling all telemetry flags, the setup runs entirely offline on Arch Linux with AMD Strix Halo hardware. The llama.cpp server was configured with 65536 context, flash attention, Q8_0 KV cache, and ROCBLAS_USE_HIPBLASLT=1 for AMD GPU compatibility.
Why It Matters
Claude Code is one of the most capable agentic coding tools available, but its cloud dependency creates cost, latency, and privacy concerns for indie developers and SMEs. This setup eliminates API billing entirely. Across 7 benchmark runs, the local setup achieved 8.37–9.71 tokens/second with peak context reaching 37.9K tokens, handling tasks from file operations to git cloning and multi-day planning with "Excellent" quality ratings. Key findings include:
- Tool chaining (multi-step file reads, git ops) works reliably at this model size
- Speed degrades slightly at high context: 9.71 t/s at 23K tokens vs 8.37 t/s at 37.9K
- Using
claude/settings.jsonis more stable than.bashrcenvironment variables for telemetry control - Setting
CLAUDE_CODE_DISABLE_1M_CONTEXT=1prevents the model from attempting context windows it cannot handle
Asia-Pacific Angle
Qwen3.5 27B is developed by Alibaba's Qwen team and is specifically strong on Chinese and multilingual codebases, making it a natural fit for developers in China and Southeast Asia building localized products. Running it locally also sidesteps API access restrictions that affect developers in regions with limited Anthropic availability. Chinese developers using domestic AMD or Hygon GPU hardware can apply the same ROCBLAS_USE_HIPBLASLT flag pattern. For Southeast Asian startups handling sensitive customer data under PDPA (Thailand), PDPC (Singapore), or similar frameworks, a fully offline coding assistant removes a significant compliance risk.
Action Item This Week
Download the unsloth/UD-Q4_K_XL quantization of Qwen3.5 27B from Hugging Face, start a llama.cpp server on port 8001 with the exact flags above, and add the five environment variables to your claude/settings.json to run one real coding task fully offline before evaluating whether this replaces your current Claude API spend.