Running 27B LLM at 80 tokens/sec on dual GPUs shows accessible inference scaling

Hacker News·1w·iMil

iMil benchmarked Qwen 3.6 27B on RTX 5080 + RTX 3090, hitting 80 tokens/sec throughput. For makers building AI features on modest hardware, this suggests you don't need enterprise clusters—stacking older and newer GPUs can hit meaningful inference speeds without proportional cost.

Share𝕏Reddit

Related stories