Google releases multi-token prediction drafters for Gemma 4

Google releases multi-token prediction drafters for Gemma 4

hackernews·2w·Google

Google open-sourced techniques to speed up Gemma 4 inference by using smaller draft models that predict multiple tokens at once, reducing latency without sacrificing output quality. For indie developers running local LLMs or self-hosted inference, this means faster response times and lower compute costs on modest hardware.