MoE

6 articles tagged with this topic

Tinygrad Tests MoE on Blackwell: Local AI Geeks Build Priciest Hardware Lego

Tinygrad MoE test on Blackwell+M3 Ultra RDMA cluster (~2TB VRAM). A geek experiment—localists stress-test open-source frameworks with radical hardware

May 32 min read

QwenCoder-Next

Qwen3.6-27B Ties Coder-Next: Pick Models by Scenario, Not Benchmarks

20-hour test: Qwen3.6-27B ties MoE Coder-Next overall but differs by task. Disabling "thinking mode" surprisingly boosts stability. Scenario fit beats

May 33 min read

MiniMax-M2.7llama.cpp

MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon

MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.

Apr 123 min read

llama.cppQwen

37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results

Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.

Apr 62 min read

Gemma 4vLLM

Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes

Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.

Apr 61 min read

llama.cppQwen Coder

APEX Quantization vs K-Quants: Why MoE Coding Models Need Different Compression

APEX quantization targets MoE architecture coherence layers at Q8, outperforming generic K-quants for multi-file coding agents.

Apr 62 min read