Speculative Decoding

1 article tagged with this topic

Google Doubles Gemma 4 Speed — Speculative Decoding Goes Mainstream

Google's Gemma 4 MTP models use speculative decoding for up to 2x speed with zero quality loss, boosting local LLM practicality and lowering compute b

May 52 min read