Was die Daten zeigen
Stand April 2026 ist das günstigste produktive KI-Modell in der Buzzi.ai-Preisdatenbank Trinity Large Preview mit $0.00 pro Million Eingabetoken. Wir verfolgen 330 Modelle nach Preis, Qualitäts-Benchmarks, Kontextfenster und Datenresidenz — täglich aktualisiert.
How it works
Three quick questions.
A real cost number in return.
No sign-up, no spreadsheet, no jargon. Built for founders, product teams, and engineers who need an answer in under a minute.
Step one
Pick a scenario.
Tell us what you want AI to do — chat, code, extract data, understand images, reason, or bulk processing. We filter the model list to what matters.
Step two
Set your usage.
Share a rough sense of volume and message length. No tokens, no math — plain English with anchors like "a side project" or "production scale."
Step three
Compare real costs.
Every model card shows your personalized monthly cost. Side-by-side bars surface the cheapest pick and how much you’d save by switching.
What we track
One database. Every provider worth watching.
330 production-ready models across 54 providers — pricing, context window, benchmarks, regions, and compliance on every row. Refreshed each morning from official pricing pages, cross-checked against third-party aggregators.
Production-ready models
330
tracked
Providers covered
54
worldwide
Quality benchmarks
1
per model
Refresh cadence
Daily
price sync
All providers
- OpenAI55
- Qwen45
- Google31
- Mistral22
- Meta14
- Anthropic14
- Z.ai13
- NVIDIA10
- DeepSeek10
- xAI10
- MiniMax8
- Arcee AI7
- Nous Research6
- Amazon5
- Sao10K5
- Baidu5
- Perplexity5
- OpenRouter4
- Moonshot AI4
- Cohere4
- ByteDance Seed4
- TheDrummer4
- Aion Labs4
- Liquid AI3
- Xiaomi3
- Microsoft2
- Reka AI2
- Allen AI2
- Morph2
- Relace2
- Inflection2
- Cognitive Computations1
- IBM1
- Gryphe1
- Alibaba1
- ByteDance1
- StepFun1
- Nex AGI1
- Tencent1
- Essential AI1
- Upstage1
- Prime Intellect1
- Inception1
- Kwaipilot1
- TNG1
- Undi951
- Writer1
- Mancer1
- AlfredPros1
- Switchpoint1
- Deep Cogito1
- AI21 Labs1
- Anthracite1
- Alpindale1
Priced today
The latest flagship model from every major lab.
Prices are per 1 million tokens. Cached and batched rates apply when you reuse prompts or accept a delay. Click a row to open the full model page.
| Provider | Model | Context | Input /1M | Output /1M |
|---|---|---|---|---|
| OpenAI | o1-pro | 200K | $150.00 | $600.00 |
| OpenAI | GPT-5.4 Pro | 1.1M | $30.00 | $180.00 |
| OpenAI | GPT-5.2 Pro | 400K | $21.00 | $168.00 |
| Anthropic | Claude Opus 4.6 (Fast) | 1M | $30.00 | $150.00 |
| Anthropic | Claude Opus 4 | 200K | $15.00 | $75.00 |
| Anthropic | Claude Opus 4.1 | 200K | $15.00 | $75.00 |
| Gemini 3.1 Pro Preview | 1.0M | $2.00 | $12.00 | |
| Gemini 3.1 Pro Preview Custom Tools | 1.0M | $2.00 | $12.00 | |
| Nano Banana Pro (Gemini 3 Pro Image Preview) | 66K | $2.00 | $12.00 | |
| Alibaba | Tongyi DeepResearch 30B A3B | 131K | $0.09 | $0.45 |
| DeepSeek | R1 0528 | 164K | $0.5 | $2.15 |
| DeepSeek | DeepSeek V3.2 Speciale | 164K | $0.4 | $1.20 |
| DeepSeek | DeepSeek V3 | 164K | $0.32 | $0.89 |
| Amazon | Nova Premier 1.0 | 1M | $2.50 | $12.50 |
| Amazon | Nova Pro 1.0 | 300K | $0.8 | $3.20 |
| Amazon | Nova 2 Lite | 1M | $0.3 | $2.50 |
| NVIDIA | Llama 3.1 Nemotron 70B Instruct | 131K | $1.20 | $1.20 |
| NVIDIA | Nemotron Nano 12B 2 VL | 131K | $0.2 | $0.6 |
| NVIDIA | Nemotron 3 Super | 262K | $0.09 | $0.45 |
| MiniMax | MiniMax M1 | 1M | $0.4 | $2.20 |
| MiniMax | MiniMax M2-her | 66K | $0.3 | $1.20 |
| MiniMax | MiniMax M2.7 | 197K | $0.3 | $1.20 |
Top 3 priced models per provider by list price. Prices refreshed daily from each provider’s public pricing page.
Best-for use cases
Pick the workload, get the shortlist.
Every use case has its own ranking — with weights tuned to what actually matters for that workload. Each page shows the top models, the pricing math, and the tradeoffs behind the pick.
- Coding
Ranked on SWE-Bench, HumanEval, and dollars-per-1M output tokens. Balanced for autonomous and assistive coding workflows.
- SWE-Bench50%
- HumanEval30%
- price20%
- RAG (Retrieval-Augmented Generation)
Ranked on long-context accuracy, groundedness, and input-token price — RAG is input-token-heavy by design.
- Long-context accuracy50%
- MMLU20%
- input price30%
- AI Agents
Ranked on multi-step reasoning, tool-use reliability, and long-horizon stability. Agentic workloads amplify small accuracy gaps.
- SWE-Bench Verified40%
- AgentBench30%
- MMLU15%
- (Vision + Text)
Ranked on vision benchmark accuracy, context window, and combined per-query cost for image + text workloads.
- MMMU40%
- DocVQA30%
- price30%
- Cheap Bulk Workloads
Ranked primarily on input and output $/1M with a benchmark floor so you do not ship junk at volume.
- input price50%
- output price30%
- MMLU20%
- Long-Context Workloads
Ranked on context window size, needle-in-a-haystack accuracy, and input price — long-context is input-token-heavy.
- context window25%
- long-context accuracy45%
- input price30%
- Reasoning
Ranked on MMLU-Pro, GPQA, and AIME. Price is a tiebreaker — reasoning quality dominates for reasoning-heavy work.
- MMLU-Pro35%
- GPQA25%
- AIME20%
- JSON / Structured Output
Ranked on JSON-mode reliability, schema-adherence, and price. Failures here tax the rest of your pipeline.
- JSON mode50%
- schema adherence30%
- price20%
- Function Calling / Tool Use
Ranked on tool-selection accuracy, multi-tool consistency, and price. Tool-use quality compounds in agent loops.
- tool selection45%
- multi-tool30%
- price25%
- Healthcare
Ranked for HIPAA-eligible deployments, clinical reasoning, and data-residency options. Compliance pillar dominates.
- HIPAA availability45%
- MedQA25%
- MMLU15%
- EU Data Residency
Ranked on EU-region availability, GDPR posture, and price. European customers with data-residency requirements first.
- EU residency50%
- GDPR20%
- MMLU15%
- Government / FedRAMP
Ranked on FedRAMP / IL authorization, data sovereignty, and reasoning quality. Certification pillar dominates.
- FedRAMP50%
- MMLU20%
- data sovereignty15%
Beyond sticker price
Five calculators that sit behind the main flow.
Once you’ve narrowed down, dig deeper — migration math, real-prompt costs, curated stacks, lifecycle risk, and compliance.
Switch cost calculator
Before you migrate, see how migration engineering hours weigh against the monthly savings over 12 months.
Prompt cost
Paste a real prompt and reply. Get a per-provider cost at today’s rates, tokenized with the right family coefficient.
Model stacks
Editorial picks for budget, balanced, and frontier use — curated by our applied-AI team, refreshed monthly.
Lifecycle timeline
Which models are sunsetting, when, and what the provider is pushing customers toward.
Compliance matrix
Regions and certifications (SOC 2, HIPAA, GDPR, FedRAMP) per provider, in one grid.
FAQ
Questions we get asked most.
Pricing freshness, sourcing, cache and batch discounts, embedding, alerts — all the things teams ask before picking a model.
Get instant answers from our AI agent
As of April 2026, the lowest input $/1M on our comparison is Trinity Large Preview. Real-world cost depends on your cache hit rate and batch eligibility.
We mirror pricing from official provider pricing pages and docs. Each model row has a "last verified" timestamp and a link to the source so you can check yourself.
A nightly snapshot cron diffs against the previous day. When a change is detected we log it and email subscribed users within 24 hours.
Models that offer cached input pricing get a separate column. The volume calculator multiplies your cache hit rate by the cached price and the rest by the standard input price.
Providers that support async batch endpoints usually list a reduced price. If a model row has a batch price, you can set the "batch eligible" slider to model cost savings for that workload share.
Our top recommendation: pick two candidates from the filtered shortlist, estimate break-even with the switch-cost calculator, and run your real prompts through "Compare my prompt" for a grounded test. Top 3 this month: Trinity Large Preview, Auto Router, Body Builder (beta).
Yes. The comparison, calculators, and public JSON API are free. Signing in with Google enables the "Compare my prompt", saved comparisons, and price alerts features.
We list the top open-weight models (Meta Llama, Mistral, DeepSeek) when a pay-per-token API exists. Self-host cost modeling is not included since it depends on your GPU inventory.
Yes — the /embed route renders a minimal iframe with attribution. Use the embed builder on the main page to generate the snippet.
After signing in you can subscribe to any model. When the nightly snapshot detects a price change or deprecation, you get an email within 24 hours.
Each task has a weighted score over benchmarks relevant to that task plus a price pillar. We publish the exact weights on the methodology page.
No. The ranking is not pay-to-play. Providers pay us nothing.
Skalierter Rollout?
Hilfe bei der Modellauswahl für deinen Use Case?
30-minütiges Gespräch mit einem Applied-AI-Lead von Buzzi. Wir schauen auf Volumen, Daten und Constraints und empfehlen einen Stack, der tatsächlich produktiv geht.