MoE
6 articles tagged with this topic
Tinygrad Tests MoE on Blackwell: Local AI Geeks Build Priciest Hardware Lego
Tinygrad MoE test on Blackwell+M3 Ultra RDMA cluster (~2TB VRAM). A geek experiment—localists stress-test open-source frameworks with radical hardware
Qwen3.6-27B Ties Coder-Next: Pick Models by Scenario, Not Benchmarks
20-hour test: Qwen3.6-27B ties MoE Coder-Next overall but differs by task. Disabling "thinking mode" surprisingly boosts stability. Scenario fit beats
MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon
MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.
37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results
Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.
Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes
Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.
APEX Quantization vs K-Quants: Why MoE Coding Models Need Different Compression
APEX quantization targets MoE architecture coherence layers at Q8, outperforming generic K-quants for multi-file coding agents.