Ideal für: Coding

Best LLM for Coding

Ranked on SWE-Bench, HumanEval, and dollars-per-1M output tokens. Balanced for autonomous and assistive coding workflows.

Aktualisiert April 2026. Top 3 diesen Monat: GPT-4o (2024-11-20), Claude Sonnet 4.5, GPT-5 Codex.

Podium

This month’s top three.

1
GPT-4o (2024-11-20)
OpenAI
Input / 1M
$2.50
Output / 1M
$10.00
Context
128,000
Model page
2
Claude Sonnet 4.5
Anthropic
Input / 1M
$3.00
Output / 1M
$15.00
Context
1,000,000
Model page
3
GPT-5 Codex
OpenAI
Input / 1M
$1.25
Output / 1M
$10.00
Context
400,000
Model page

So werten wir

Weights tuned for coding.

Choosing an LLM for coding comes down to three things: how well it turns specifications into working code, how well it reasons about large repositories, and how much it will cost once you wire it into CI or an agent loop. We weight SWE-Bench heaviest because it best predicts real-world coding-agent success, followed by HumanEval for short-form correctness, and a price pillar so the recommendation survives contact with a finance review.

Our full methodology is published on the Methodik-Seite.

Säulen und Gewichte:

SWE-Bench50%
HumanEval30%
price20%

Full ranking

Top-Modelle

Rang	Modell	Anbieter	Input $/1M	Output $/1M	Kontext
1	GPT-4o (2024-11-20)	OpenAI	$2.50	$10.00	128,000
2	Claude Sonnet 4.5	Anthropic	$3.00	$15.00	1,000,000
3	GPT-5 Codex	OpenAI	$1.25	$10.00	400,000
4	Gemini 2.5 Pro	Google	$1.25	$10.00	1,048,576
5	Gemini 2.5 Pro Preview 06-05	Google	$1.25	$10.00	1,048,576
6	GPT-5.1-Codex	OpenAI	$1.25	$10.00	400,000
7	o3	OpenAI	$2.00	$8.00	200,000
8	Claude 3.7 Sonnet	Anthropic	$3.00	$15.00	200,000
9	Claude 3.7 Sonnet (thinking)	Anthropic	$3.00	$15.00	200,000
10	GPT-5 Mini	OpenAI	$0.25	$2.00	400,000

Field notes

Tipps für coding

01
Prefer a model with a large context window if your repo is bigger than ~200 files.
02
Use batch pricing for CI / nightly refactor jobs; interactive IDE work stays on the standard price.
03
Check function-calling reliability before committing to an agentic flow.

FAQ

Häufige Fragen

The questions teams ask before picking a model for coding.

Get instant answers from our AI agent

As of April 2026, our weighted top 3 are GPT-4o (2024-11-20), Claude Sonnet 4.5, GPT-5 Codex.

Claude wins long-horizon refactoring; GPT wins short-burst correctness. The right answer depends on your workload mix — see the scoring pillars below.

Rarely. Fine-tuning on proprietary code still helps, but for 90% of shops a strong frontier model with RAG over the repo gets you most of the way.

DeepSeek and Meta Llama variants are competitive on price. We list their hosted pricing here; self-host economics live in our Shadow AI audit tool.

About

Insights

Streamline

Integration

Solutions

Healthcare AI

Use Cases

Industries

Best LLM for Coding

This month’s top three.

Weights tuned for coding.

Top-Modelle

Tipps für coding

Häufige Fragen

Model your own workload.

Best LLM for Coding

This month’s top three.

Weights tuned for coding.

Top-Modelle

Tipps für coding

Häufige Fragen

Verwandte Aufgaben

Model your own workload.