Ideal für: Reasoning

Best LLM for Reasoning

Ranked on MMLU-Pro, GPQA, and AIME. Price is a tiebreaker — reasoning quality dominates for reasoning-heavy work.

Aktualisiert June 2026. Top 3 diesen Monat: R1 0528, Qwen3.5 Plus 2026-02-15, Qwen3.5 397B A17B.

Podium

This month’s top three.

1
R1 0528
DeepSeek
Input / 1M
$0.50
Output / 1M
$2.15
Context
163,840
Model page
2
Qwen3.5 Plus 2026-02-15
Qwen
Input / 1M
$0.26
Output / 1M
$1.56
Context
1,000,000
Model page
3
Qwen3.5 397B A17B
Qwen
Input / 1M
$0.39
Output / 1M
$2.34
Context
262,144
Model page

So werten wir

Weights tuned for reasoning.

Reasoning workloads — math, logic, science, multi-step planning — reward the top-tier frontier models disproportionately. The gap between the best and second-best can be a 20-point accuracy swing. We weight reasoning benchmarks heavily and use price only as a tiebreaker.

Our full methodology is published on the Methodik-Seite.

Säulen und Gewichte:

MMLU-Pro35%
GPQA25%
AIME20%
price20%

Full ranking

Top-Modelle

Rang	Modell	Anbieter	Input $/1M	Output $/1M	Kontext
1	R1 0528	DeepSeek	$0.50	$2.15	163,840
2	Qwen3.5 Plus 2026-02-15	Qwen	$0.26	$1.56	1,000,000
3	Qwen3.5 397B A17B	Qwen	$0.39	$2.34	262,144
4	MiniMax M2.1	MiniMax	$0.29	$0.95	196,608
5	Claude Sonnet 4.5	Anthropic	$3.00	$15.00	1,000,000
6	MiMo-V2-Flash	Xiaomi	$0.09	$0.29	262,144
7	Qwen3.5-122B-A10B	Qwen	$0.26	$2.08	262,144
8	Qwen3.5-27B	Qwen	$0.20	$1.56	262,144
9	Olmo 3 32B Think	Allen AI	$0.15	$0.50	65,536
10	Qwen3.5-35B-A3B	Qwen	$0.16	$1.30	262,144

Field notes

Tipps für reasoning

01
Turn on native reasoning mode if the model offers it — the accuracy gains are real.
02
Reasoning mode costs more tokens. Budget accordingly.
03
Ensemble a cheap model + a reasoning model behind a router to control cost.

FAQ

Häufige Fragen

The questions teams ask before picking a model for reasoning.

Get instant answers from our AI agent

As of June 2026, our weighted top 3 for reasoning are R1 0528, Qwen3.5 Plus 2026-02-15, Qwen3.5 397B A17B.

Yes — typically 2–5x in output tokens, occasionally more. Check your billing.

Not well on frontier benchmarks. For simple chains of thought they can be OK, but multi-step reasoning clearly separates the top tier.

About

Insights

Streamline

Integration

Solutions

Healthcare AI

Use Cases

Industries

Best LLM for Reasoning

This month’s top three.

Weights tuned for reasoning.

Top-Modelle

Tipps für reasoning

Häufige Fragen

Model your own workload.

Best LLM for Reasoning

This month’s top three.

Weights tuned for reasoning.

Top-Modelle

Tipps für reasoning

Häufige Fragen

Verwandte Aufgaben

Model your own workload.