Best for: Reasoning
Best LLM for Reasoning
Ranked on MMLU-Pro, GPQA, and AIME. Price is a tiebreaker — reasoning quality dominates for reasoning-heavy work.
Updated May 2026. Top 3 this month: R1 0528, Qwen3.5 Plus 2026-02-15, Qwen3.5 397B A17B.
Best for: Reasoning
Ranked on MMLU-Pro, GPQA, and AIME. Price is a tiebreaker — reasoning quality dominates for reasoning-heavy work.
Updated May 2026. Top 3 this month: R1 0528, Qwen3.5 Plus 2026-02-15, Qwen3.5 397B A17B.
Podium
How we rank
Reasoning workloads — math, logic, science, multi-step planning — reward the top-tier frontier models disproportionately. The gap between the best and second-best can be a 20-point accuracy swing. We weight reasoning benchmarks heavily and use price only as a tiebreaker.
Our full methodology is published on the methodology page.
Pillars and weights:
Full ranking
| Rank | Model | Provider | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|---|
| 1 | R1 0528 | DeepSeek | $0.50 | $2.15 | 163,840 |
| 2 | Qwen3.5 Plus 2026-02-15 | Qwen | $0.26 | $1.56 | 1,000,000 |
| 3 | Qwen3.5 397B A17B | Qwen | $0.39 | $2.34 | 262,144 |
| 4 | MiniMax M2.1 | MiniMax | $0.29 | $0.95 | 196,608 |
| 5 | Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 1,000,000 |
| 6 | MiMo-V2-Flash | Xiaomi | $0.09 | $0.29 | 262,144 |
| 7 | Qwen3.5-122B-A10B | Qwen | $0.26 | $2.08 | 262,144 |
| 8 | Qwen3.5-27B | Qwen | $0.20 | $1.56 | 262,144 |
| 9 | Olmo 3 32B Think | Allen AI | $0.15 | $0.50 | 65,536 |
| 10 | Qwen3.5-35B-A3B | Qwen | $0.16 | $1.30 | 262,144 |
Field notes
Turn on native reasoning mode if the model offers it — the accuracy gains are real.
Reasoning mode costs more tokens. Budget accordingly.
Ensemble a cheap model + a reasoning model behind a router to control cost.
FAQ
The questions teams ask before picking a model for reasoning.
Get instant answers from our AI agent