📊 Live Benchmark

KI-Modelle Leaderboard 2026

Alle relevanten Large Language Models auf einen Blick — sortiert nach Quality Index, Geschwindigkeit, Latenz und Preis. Datenquelle: Artificial Analysis.

Aktuell gefiltert auf Anthropic · 18 Modelle · Letzte Synchronisation: 24. Juli 2026

Wie liest du dieses Leaderboard?

Quality Index ist ein zusammengesetzter Wert von Artificial Analysis aus über zehn unabhängigen Benchmarks (MMLU-Pro, GPQA, HLE, LiveCodeBench, SciCode, AIME u. a.). Höher = besser. Speed misst Output-Tokens pro Sekunde im Median über alle Hosting-Provider. Latency ist die Time-to-First-Token in Sekunden — wichtig für Streaming-UIs. Preise sind US-Dollar pro 1 Million Tokens, separat für Input und Output. Sortiere nach deiner Priorität, filter nach Anbieter oder Preis-Bucket — und prüfe konkrete Kosten direkt im Token-Rechner oder zwei Modelle im Head-to-Head-Vergleich.

#	Modell	Vendor	Quality	Speed	Latency	Preis (USD/1M)
1	Anthropic: Claude Fable 5 anthropic/claude-fable-5	Anthropic	59,9	60,2 t/s	73,24 s	$10.00 in $50.00 out
2	Anthropic: Claude Opus 4.8 anthropic/claude-opus-4.8	Anthropic	55,7	65,9 t/s	48,25 s	$5.00 in $25.00 out
3	Anthropic: Claude Opus 4.7 anthropic/claude-opus-4.7	Anthropic	53,5	60,7 t/s	10,63 s	$5.00 in $25.00 out
4	Anthropic: Claude Sonnet 5 anthropic/claude-sonnet-5	Anthropic	41,7	64,9 t/s	5,79 s	$2.00 in $10.00 out
5	Anthropic: Claude Opus 4.6 anthropic/claude-opus-4.6	Anthropic	37,8	69,7 t/s	3,52 s	$5.00 in $25.00 out
6	Anthropic: Claude Sonnet 4.6 anthropic/claude-sonnet-4.6	Anthropic	35,9	66,1 t/s	2,10 s	$3.00 in $15.00 out
7	Anthropic: Claude Opus 4.5 anthropic/claude-opus-4.5	Anthropic	34,7	82,8 t/s	1,77 s	$5.00 in $25.00 out
8	Anthropic: Claude Sonnet 4.5 anthropic/claude-sonnet-4.5	Anthropic	29,3	76,1 t/s	1,34 s	$3.00 in $15.00 out
9	Anthropic: Claude Opus 4.1 anthropic/claude-opus-4.1	Anthropic	28,2	0 t/s	0 ms	$15.00 in $75.00 out
10	Anthropic: Claude Opus 4 anthropic/claude-opus-4	Anthropic	25,5	0 t/s	0 ms	$15.00 in $75.00 out
11	Anthropic: Claude Sonnet 4 anthropic/claude-sonnet-4	Anthropic	25,5	0 t/s	0 ms	$3.00 in $15.00 out
12	Anthropic: Claude Haiku 4.5 anthropic/claude-haiku-4.5	Anthropic	23,7	101,3 t/s	733 ms	$1.00 in $5.00 out
13	Anthropic: Claude 3.7 Sonnet anthropic/claude-3.7-sonnet	Anthropic	23,5	0 t/s	0 ms	$3.00 in $15.00 out
14	Anthropic: Claude 3.5 Haiku anthropic/claude-3.5-haiku	Anthropic	12,3	0 t/s	0 ms	$0 in $0 out
15	Anthropic: Claude 3 Haiku anthropic/claude-3-haiku	Anthropic	3,9	0 t/s	0 ms	$0.25 in $1.25 out
16	Anthropic: Claude Opus 4.6 (Fast) anthropic/claude-opus-4.6-fast	Anthropic	—	— t/s	—	— in — out
17	Anthropic: Claude Opus 4.8 (Fast) anthropic/claude-opus-4.8-fast	Anthropic	—	— t/s	—	— in — out
18	Anthropic: Claude Opus 4.7 (Fast) anthropic/claude-opus-4.7-fast	Anthropic	—	— t/s	—	— in — out

🧮 Tool

Pricing-Calculator

Schätze deine monatlichen API-Kosten — gib dein erwartetes Token-Volumen ein, wir rechnen für die Top-15 Modelle.

Input Tokens (Mio./Monat) Output Tokens (Mio./Monat)

#	Modell	Vendor	Quality	Kosten/Monat (USD)

📈 Tool

Quality vs. Preis (Pareto-Chart)

Wo liegt der beste Tradeoff? Modelle oben-links sind die Pareto-Optima: hohe Quality, niedriger Preis.

Pareto-Optimum Andere Modelle

❓ Häufige Fragen zum KI-Modelle-Leaderboard

Woher stammen die Leaderboard-Daten?

Quality Index, Speed (Output-Tokens/s), Latenz (Time-to-First-Token) und Preise (USD pro 1 Mio. Tokens) kommen direkt von Artificial Analysis. Synchronisation täglich um 04:00 UTC. Wir speichern keine eigenen Benchmark-Werte und führen keine eigene Bewertung durch.

Was misst der Quality Index genau?

Der Artificial-Analysis-Intelligence-Index ist ein zusammengesetzter Score aus über zehn unabhängigen Benchmarks: MMLU-Pro (Wissen), GPQA & HLE (Reasoning), LiveCodeBench & SciCode (Coding), AIME (Mathematik), IFBench (Instruction-Following), LCR (Long-Context-Recall) und τ² (Tool-Use). Skala 0–100 — höher = besser.

Warum sind manche bekannte Modelle nicht im Leaderboard?

Wir zeigen nur Modelle, die in der Registry als is_active=true markiert sind und für die Artificial Analysis vollständige Benchmark-Daten liefert. Reine Bild- oder Audio-Modelle, deprecatete Versionen sowie Closed-Beta-Modelle ohne öffentliche API erscheinen nicht.

Wie nutze ich Sortierung und Filter sinnvoll?

Sortiere nach Quality für maximale Genauigkeit, nach Speed für hohen Durchsatz, nach Latency für reaktive Streaming-UIs und nach Preis für kostensensitive Workloads. Kombiniere die Vendor- und Preis-Filter, um z. B. „nur OpenAI unter 5 USD/Mio. Output“ zu finden.

Wie berechne ich konkrete Token-Kosten?

Im Token-Kostenrechner kannst du Input- und Output-Volumen einsetzen und die monatlichen Kosten für jedes Modell live durchspielen. Für direkten Head-to-Head-Vergleich zweier Modelle nutze die Vergleichs-Seiten.