Google releases Gemma 4 QAT models for efficient on-device inference

Hacker News·1mo·Google

Google shipped quantization-aware training variants of Gemma 4 designed to run efficiently on consumer hardware without sacrificing accuracy. For indie developers building AI features, this means smaller model footprints and faster inference on laptops and mobile devices—useful if you're shipping LLM capabilities without relying on API calls.