Gemma 4

12 articles tagged with this topic

Google Doubles Gemma 4 Speed — Speculative Decoding Goes Mainstream

Google's Gemma 4 MTP models use speculative decoding for up to 2x speed with zero quality loss, boosting local LLM practicality and lowering compute b

May 52 min read

GoogleGemma 4

Google Gemma 4 Fixes Chat Template — Local LLM Usability Inches Forward

Google fixed Gemma 4's chat template bug; community quantized versions updated. Not major news, but proves local AI usability inches up via detail ref

May 42 min read

Gemma 4MLX

Gemma 4 audio with MLX

Google's Gemma 4 E2B model can transcribe audio locally on macOS using MLX and a single uv run command.

Apr 133 min read

Gemma 4llama.cpp

Fixing Gemma 4 Tool Calls in llama.cpp: Root Causes Explained

Four bugs in llama.cpp's Gemma 4 chat template handling caused tool call results to crash or loop.

Apr 83 min read

Gemma 4Qwen3

Controlling Gemma 4 Thinking Tokens via System Prompts

Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.

Apr 83 min read

Gemma 4LiteRT

Gemma 4 Has Hidden MTP Heads Disabled by Google at Launch

A developer found multi-token prediction weights inside Gemma 4's LiteRT files; Google confirmed MTP exists but was intentionally disabled.

Apr 74 min read

Gemma 4Google DeepMind

Gemma 4 31B Ranks Top-3 in Five European Languages on EuroEval

Gemma 4 31B scores 1st in Finnish, 2nd in Danish/French/Italian on EuroEval multilingual leaderboard.

Apr 74 min read

Gemma 4llama.cpp

Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks

Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.

Apr 72 min read

Gemma 4Google DeepMind

Inside Google DeepMind's Gemma 4 Launch: What It Actually Took

A Reddit thread breaks down the engineering and logistics behind launching Gemma 4, Google DeepMind's open model.

Apr 61 min read

Gemma 4vLLM

Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes

Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.

Apr 61 min read

Gemma 4LiteRT

Run a Private AI Phone Agent On-Device with Gemma 4 and PokeClaw

PokeClaw runs Gemma 4 locally on Android to control any app—no cloud, no data leakage, no subscription.

Apr 62 min read

llama.cppGemma 4

Gemma 4 llama.cpp Issues Resolved With Recent Fixes

Google Gemma 4 models now run correctly in llama.cpp after critical fixes for output quality and crashes

Apr 41 min read