Gemma 4
12 articles tagged with this topic
Google Doubles Gemma 4 Speed — Speculative Decoding Goes Mainstream
Google's Gemma 4 MTP models use speculative decoding for up to 2x speed with zero quality loss, boosting local LLM practicality and lowering compute b
Google Gemma 4 Fixes Chat Template — Local LLM Usability Inches Forward
Google fixed Gemma 4's chat template bug; community quantized versions updated. Not major news, but proves local AI usability inches up via detail ref
Gemma 4 audio with MLX
Google's Gemma 4 E2B model can transcribe audio locally on macOS using MLX and a single uv run command.
Fixing Gemma 4 Tool Calls in llama.cpp: Root Causes Explained
Four bugs in llama.cpp's Gemma 4 chat template handling caused tool call results to crash or loop.
Controlling Gemma 4 Thinking Tokens via System Prompts
Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.
Gemma 4 Has Hidden MTP Heads Disabled by Google at Launch
A developer found multi-token prediction weights inside Gemma 4's LiteRT files; Google confirmed MTP exists but was intentionally disabled.
Gemma 4 31B Ranks Top-3 in Five European Languages on EuroEval
Gemma 4 31B scores 1st in Finnish, 2nd in Danish/French/Italian on EuroEval multilingual leaderboard.
Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks
Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.
Inside Google DeepMind's Gemma 4 Launch: What It Actually Took
A Reddit thread breaks down the engineering and logistics behind launching Gemma 4, Google DeepMind's open model.
Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes
Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.
Run a Private AI Phone Agent On-Device with Gemma 4 and PokeClaw
PokeClaw runs Gemma 4 locally on Android to control any app—no cloud, no data leakage, no subscription.
Gemma 4 llama.cpp Issues Resolved With Recent Fixes
Google Gemma 4 models now run correctly in llama.cpp after critical fixes for output quality and crashes