Running 27B LLM at 80 tokens/sec on dual GPUs shows accessible inference scaling
Hacker News·1w·iMil
iMil benchmarked Qwen 3.6 27B on RTX 5080 + RTX 3090, hitting 80 tokens/sec throughput. For makers building AI features on modest hardware, this suggests you don't need enterprise clusters—stacking older and newer GPUs can hit meaningful inference speeds without proportional cost.
Original story
Read the original on Hacker NewsRelated stories


Devtools
Espressif releases ESP32-S31, a stripped-down microcontroller for cost-conscious projectsHacker News·3w·volemo