Bonsai 1.7B ternary model hits 442T/s on M4 Max
Hacker News·2w·hhuytho
A new quantized model achieves notably fast inference on consumer hardware by using ternary weights instead of standard formats. For indie developers building on-device AI, this kind of performance on a laptop processor could meaningfully reduce infrastructure costs.
Original story
Read the original on Hacker News