Ideal für: Function Calling / Tool Use

Best LLM for Function Calling / Tool Use

Ranked on tool-selection accuracy, multi-tool consistency, and price. Tool-use quality compounds in agent loops.

Aktualisiert April 2026. Top 3 diesen Monat: R1 0528, Qwen3.5 Plus 2026-02-15, DeepSeek V3.

Podium

This month’s top three.

  • 1
    R1 0528
    DeepSeek
    Input / 1M
    $0.50
    Output / 1M
    $2.15
    Context
    163,840
    Model page
  • 2
    Qwen3.5 Plus 2026-02-15
    Qwen
    Input / 1M
    $0.26
    Output / 1M
    $1.56
    Context
    1,000,000
    Model page
  • 3
    DeepSeek V3
    DeepSeek
    Input / 1M
    $0.32
    Output / 1M
    $0.89
    Context
    163,840
    Model page

So werten wir

Weights tuned for function calling / tool use.

Function calling is the connective tissue of agent systems. A model that picks the wrong tool once in 20 calls is unacceptable for any non-trivial automation. We weight tool-selection accuracy and multi-tool benchmarks heavily, then price.

Our full methodology is published on the Methodik-Seite.

Säulen und Gewichte:

  • tool selection45%
  • multi-tool30%
  • price25%

Full ranking

Top-Modelle

RangModellAnbieterInput $/1MOutput $/1MKontext
1R1 0528DeepSeek$0.50$2.15163,840
2Qwen3.5 Plus 2026-02-15Qwen$0.26$1.561,000,000
3DeepSeek V3DeepSeek$0.32$0.89163,840
4Qwen3.5 397B A17BQwen$0.39$2.34262,144
5Hunyuan A13B InstructTencent$0.14$0.57131,072
6MiniMax M2.1MiniMax$0.29$0.95196,608
7Trinity Large PreviewArcee AI$0.00$0.00131,000
8GPT-4o (2024-11-20)OpenAI$2.50$10.00128,000
9MiniMax-01MiniMax$0.20$1.101,000,192
10Claude Sonnet 4.5Anthropic$3.00$15.001,000,000

Field notes

Tipps für function calling / tool use

  • 01

    Keep the tool list short and well-named. Long tool lists degrade accuracy.

  • 02

    Use JSON schemas with required fields to reduce malformed calls.

  • 03

    Log tool failures and retry with a fallback model tier if needed.

FAQ

Häufige Fragen

The questions teams ask before picking a model for function calling / tool use.

Get instant answers from our AI agent

As of April 2026, our weighted top 3 are R1 0528, Qwen3.5 Plus 2026-02-15, DeepSeek V3.
Accuracy drops noticeably past ~30 tools in a single call. Route to a smaller toolset per conversation turn when you can.
Directionally yes — run the top 2 on your actual tool catalog before committing.