Article Not Found

Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp

What Happened

A developer on r/LocalLLaMA successfully replaced Claude Code's cloud backend with a fully local llama.cpp server running Qwen3.5 27B (unsloth/UD-Q4_K_XL quantization). By redirecting ANTHROPIC_BASE_URL to http://127.0.0.1:8001 and disabling all telemetry flags, the setup runs entirely offline on Arch Linux with AMD Strix Halo hardware. The llama.cpp server was configured with 65536 context, flash attention, Q8_0 KV cache, and ROCBLAS_USE_HIPBLASLT=1 for AMD GPU compatibility.

Why It Matters

Claude Code is one of the most capable agentic coding tools available, but its cloud dependency creates cost, latency, and privacy concerns for indie developers and SMEs. This setup eliminates API billing entirely. Across 7 benchmark runs, the local setup achieved 8.37–9.71 tokens/second with peak context reaching 37.9K tokens, handling tasks from file operations to git cloning and multi-day planning with "Excellent" quality ratings. Key findings include:

Tool chaining (multi-step file reads, git ops) works reliably at this model size
Speed degrades slightly at high context: 9.71 t/s at 23K tokens vs 8.37 t/s at 37.9K
Using claude/settings.json is more stable than .bashrc environment variables for telemetry control
Setting CLAUDE_CODE_DISABLE_1M_CONTEXT=1 prevents the model from attempting context windows it cannot handle

Asia-Pacific Angle

Qwen3.5 27B is developed by Alibaba's Qwen team and is specifically strong on Chinese and multilingual codebases, making it a natural fit for developers in China and Southeast Asia building localized products. Running it locally also sidesteps API access restrictions that affect developers in regions with limited Anthropic availability. Chinese developers using domestic AMD or Hygon GPU hardware can apply the same ROCBLAS_USE_HIPBLASLT flag pattern. For Southeast Asian startups handling sensitive customer data under PDPA (Thailand), PDPC (Singapore), or similar frameworks, a fully offline coding assistant removes a significant compliance risk.

Action Item This Week

Download the unsloth/UD-Q4_K_XL quantization of Qwen3.5 27B from Hugging Face, start a llama.cpp server on port 8001 with the exact flags above, and add the five environment variables to your claude/settings.json to run one real coding task fully offline before evaluating whether this replaces your current Claude API spend.

Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp

What Happened

Why It Matters

Asia-Pacific Angle

Action Item This Week

相关推荐

你的网课平台凌晨挂了 3 小时你还在睡 — 免费给核心业务装个报警器

脑子里明明有很多想法，却不知道从哪开始写 — 这个方法帮我一次挖出 100 个选题

你保存在浏览器里的客户密码，可能正在被一个「假工具」悄悄复制走

你的报价单发出去就没声音了？我用这个方法让客户主动回消息

笔记软件选错了，客户资料和项目进度全乱套 —— 我踩过这坑，现在帮你少走弯路

你的 AI 工具账号，真的只有你自己能用吗？一个真实泄露事件让我重新检查了所有密码