GGUF

9 articles tagged with this topic

Google Gemma 4 Fixes Chat Template — Local LLM Usability Inches Forward

Google fixed Gemma 4's chat template bug; community quantized versions updated. Not major news, but proves local AI usability inches up via detail ref

May 42 min read

MistralUnsloth

Mistral Local GGUF Bug Fixed — Open Source QA Gaps Are Bigger Than You Think

Mistral Medium 3.5 GGUF files corrupted, community-fixed. Reveals open source QA gap: APIs tested, local formats not—impacts enterprise deployments.

May 22 min read

UnslothQwen3.6

Qwen3.6 GGUF Benchmarks

Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.

Apr 173 min read

Gemma- 4Qwen3.5

Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga

Oobabooga published 5 benchmark reports covering 70-90 GGUF quants each for Gemma 4 and Qwen 3.5 models using KL Divergence methodology.

Apr 153 min read

Qwen3.5GGUF

Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores

KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.

Apr 143 min read

llama.cppAndroid

端侧AI 模型部署实战五(Android大模型加载)

Step-by-step JNI bridge implementation for running quantized LLMs on Android using llama.cpp.

Apr 143 min read

UnslothMiniMax-M2.7

Unsloth Releases Full GGUF Quant Suite for MiniMax M2.7

Unsloth uploads 22 GGUF quantizations of MiniMax M2.7, ranging from 1-bit (60.7 GB) to BF16 (457 GB).

Apr 123 min read

MiniMax-M2.7llama.cpp

MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon

MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.

Apr 123 min read

Gemma 4llama.cpp

Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks

Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.

Apr 72 min read