Google releases Gemma 4 12B, a single multimodal model for text and images

Google releases Gemma 4 12B, a single multimodal model for text and images

Hacker News·6d·Google

Google's new Gemma 4 12B combines text and image understanding in one model without separate encoders, aimed at developers building on-device or cost-constrained AI applications. For indie makers, this means easier deployment of multimodal features without juggling multiple model architectures or managing complex pipelines.

Share𝕏Reddit

Related stories