What is the best LLM for coding in 2026?

Among models you can actually buy, Claude Opus 4.8 posts the highest coding composite in this dataset, 74.3 of 100. (Claude Fable 5 scores higher at 76.5 but is suspended worldwide under a US export-control directive, so it is listed for reference only.) On a value basis, points bought per dollar, cheaper models such as MiniMax M3 lead instead, because they score within range of the frontier at a fraction of the token price.

What is the best value AI model?

On the coding composite, MiniMax M3 is the value leader at about 111.6 points per dollar of blended price, roughly 15.1 times Claude Opus 4.8. Other low-cost models (Qwen3.7 Plus, DeepSeek V4, Kimi K2.7) cluster near the top too. Value rewards low price, so it favors capable cheap models over the most expensive flagships: GPT-5.2 Pro, at $21/$168 per Mtok, lands last on value despite a top-tier score.

How is the value score calculated?

Value equals the benchmark composite score divided by a blended token price. The blended price weights input and output tokens 3 to 1: (3 times input + output) divided by 4, in dollars per million tokens. The 3:1 mix reflects agentic coding, which reads far more context than it writes. A higher value means more measured ability per dollar; it is a cost-efficiency measure, not a quality ranking on its own.

Where do the benchmark scores come from?

The composite Coding and Intelligence scores are read from the Price Per Token dataset, which aggregates independent benchmarks and cites Artificial Analysis, the HuggingFace Open LLM Leaderboard, and LayerLens. They are composite scores on a 0-100 scale, not a single named test. Prices marked with a check are reconciled with this site's own verified registry against the provider source; the rest are the author-direct or endpoint rate reported by the source dataset, labeled per row and not yet independently re-verified. This leaderboard sells no placement: ranking is never for sale.

Why are GPT-5.5 and GPT-5.6 not on the leaderboard?

They are tracked on the model registry but not yet scored. GPT-5.6 (Sol, Terra, Luna) was previewed on June 26, 2026 in a limited, US-government-gated rollout, and GPT-5.5 has no composite in the source benchmark dataset yet. A model is added here only once an independent composite score exists for it, so the ranking is not padded with vendor-reported numbers.

Leaderboard· Updated June 27, 2026

The AI model value leaderboard

Name: AI model value leaderboard
Creator: Capital & Compute
License: https://creativecommons.org/licenses/by/4.0/

Most rankings tell you which model scores highest. This one also tells you which model is worth the money. Each LLM is rated two ways: its independent benchmark score, and its value, the points it buys you per dollar of tokens. The two orders are not the same.

Which AI model is the best value right now?

On benchmark scores alone, the strongest LLM you can buy here is Claude Opus 4.8 (74.3 of 100 on the coding composite). But cost flips the ranking: MiniMax M3 delivers about 111.6 coding points per dollar of blended token price, roughly 15.1x the value of Claude Opus 4.8. The cheaper models, mostly from Chinese labs, win on value; the priciest US flagships win on raw capability. Which one is "best" depends entirely on whether you are buying ability or buying ability per dollar.

MiniMax M3

Best value, coding

111.6 coding points per dollar (blended)

Claude Opus 4.8

Top coding score (buyable)

74.3 of 100 on the coding composite

$0.50

Cheapest to run

MiniMax M3, blended (3:1) price per Mtok

Models tracked

12 scored on coding, 21 on intelligence

The value ladder, in one chart

Coding points per dollar of blended token price, for the models you can buy today. The ranking is almost the inverse of the raw-score ranking: the cheapest capable models sit at the top because they score within range of the frontier at a fraction of the price.

AI model coding value: composite coding score per dollar of blended token price
Item	Value
MiniMax M3	111.6
Qwen3.7 Plus	99.8
Nemotron 3 Ultra	36.5
Kimi K2.7 Code	35.5
Devstral 2	34.8
GLM-5.2	32.0
Llama 4 Maverick	31.8
Grok 4.3	22.5
Qwen3.7 Max	17.6
Gemini 3.1 Pro	15.3
Claude Opus 4.8	7.4

Coding value (composite score divided by blended token price) for buyable models. Higher is more coding ability per dollar. Blended price weights input to output 3 to 1.Source: Price Per Token (composite scores) and Capital & Compute verified pricing

The same picture, on general intelligence

Coding has no score for every model, but the broader intelligence composite does, so this ladder includes all the buyable models, DeepSeek, the GPT-5 tiers, Grok, Sonnet and Haiku among them. The story holds: the cheap models lead on value, and the priciest flagships (GPT-5.2 Pro, Claude Opus 4.8, Grok 4) fall to the bottom, where a top score cannot outrun a high token price.

AI model intelligence value: composite intelligence score per dollar of blended token price
Item	Value
MiniMax M3	84.6
Qwen3.7 Plus	69.6
DeepSeek V4	57.4
Nemotron 3 Ultra	28.0
Llama 4 Maverick	27.9
Kimi K2.7 Code	24.5
GLM-5.2	23.8
Devstral 2	21.3
Grok 4.3	15.9
Qwen3.7 Max	12.3
Claude Haiku 4.5	11.9
Gemini 3.1 Pro	10.3
Gemini 3.5 Flash	10.3
GPT-5.3 Codex	9.2
Gemini 3 Pro Preview	7.4
Claude Sonnet 4.6	5.7
Claude Opus 4.8	5.6
Grok 4	5.6
GPT-5.2	5.4
GPT-5.2 Pro	0.7

Intelligence value (composite score divided by blended token price) for every buyable model. Higher is more measured reasoning per dollar. The expensive flagships rank low here despite strong raw scores.Source: Price Per Token (composite scores) and Capital & Compute / source-dataset pricing

Rank it yourself

Switch the benchmark between coding and general intelligence, and switch the sort between value and raw score. The default view is coding, ranked by value.

Best LLM by coding, ranked by value

Composite of code generation, understanding, and problem-solving (0-100). Value is coding points per dollar of blended token price.

Live ranking

Benchmark

Coding Intelligence

Coding: code-writing and problem-solving. Intelligence: general reasoning. Both are 0-100 composite scores.

Sort by

Value Raw score

Value: points per dollar (best bang for the buck). Raw score: highest benchmark, price aside.

What is value? It is how many coding points you get per dollar of tokens: coding score ÷ blended price, where blended price = (3 × input + output) ÷ 4 per million tokens (input is weighted higher because coding agents read far more than they write). A higher value means more measured ability per dollar; it is a cost-efficiency measure, not a verdict on which model is best.

#	Model	Coding	Input $/Mtok	Output $/Mtok	Value (pts/$)
1	MiniMax M3 MiniMax	58.6	$0.30	$1.20	111.6 Best value
2	Qwen3.7 Plus Alibaba	55.9	$0.32	$1.28	99.8
3	Nemotron 3 Ultra Nvidia	49.3	$0.60	$3.60	36.5
4	Kimi K2.7 Code Moonshot✓	60.8	$0.95	$4	35.5
5	Devstral 2 Mistral	31.3	$0.90	$0.90	34.8
6	GLM-5.2 Zhipu✓	68.8	$1.40	$4.40	32.0
7	Llama 4 Maverick Meta	16.3	$0.35	$1	31.8
8	Grok 4.3 xAI	35.2	$1.25	$2.50	22.5
9	Qwen3.7 Max Alibaba✓	66.0	$2.50	$7.50	17.6
10	Gemini 3.1 Pro Google✓	68.8	$2	$12	15.3
11	Claude Opus 4.8 Anthropic✓	74.3	$5	$25	7.4
12	Claude Fable 5 Anthropic✓· unavailable	76.5	$10	$50	3.8

Switch the benchmark or sort to re-rank. Scores are 0-100 composites from the source dataset. A ✓ next to the provider marks a price reconciled with our verified registry against the provider source; the rest are the author-direct or endpoint rate reported by the source dataset, not yet independently re-verified.

How to read this leaderboard

Two grounded inputs, one derived number. The benchmark scores are composite Coding and Intelligence scores read from the Price Per Token dataset, which aggregates independent benchmarks and cites Artificial Analysis, the HuggingFace Open LLM Leaderboard, and LayerLens. They are composites on a 0-100 scale, not a single named test. The prices carry a check when they are reconciled with this site's own verified registry, the same numbers behind the model release tracker; the rest are the author-direct or endpoint rate reported by the source dataset, labeled per row and not yet independently re-verified. Each links to its provider source.

From those two, the leaderboard computes value: the benchmark score divided by a blended token price, where blended price weights input and output tokens 3 to 1, ((3 × input) + output) ÷ 4. The 3:1 mix mirrors how an agentic coding session actually bills: it reads far more context than it writes. Value is a cost-efficiency measure, not a verdict on quality. A model can top the value ranking and still be the wrong choice for work that needs the highest absolute score. It is the same lesson as the price reversal in per-task cost: the headline number and the number that matters are rarely the same.

The independence rule

This leaderboard sells no placement. Ranking is never for sale, and no model is promoted for payment. The entire point of a value table is to be a neutral referee of what each model actually costs to use; the day a vendor could pay to look cheaper, it would be worthless. The benchmark scores come from an independent third party; the prices come from primary provider sources.

What is not here, and why

A model is listed only once an independent composite score exists for it, so the table is not padded with vendor-reported numbers. That leaves a few notable models tracked but not yet ranked:

GPT-5.5 / GPT-5.6 Sol, Terra, Luna. OpenAI's newest flagships. GPT-5.6 was previewed June 26, 2026 in a limited, US-government-gated rollout, and GPT-5.5 has no composite in the source dataset yet, so neither is scored. GPT-5.2 Pro and GPT-5.3 Codex represent OpenAI on the board for now.
Cohere North Mini Code. Free on hosted endpoints and open-weight, so a per-token value score is undefined. It posts a 33.4 Artificial Analysis Coding Index; the real cost is self-hosted compute, not a token rate.
Smaller and older variants. Models below roughly 10B parameters, superseded 2024-era releases (Claude 3.5, GPT-4 Turbo, o1), and narrowly tracked or unpriced entries are left off to keep the board to current, recognizable, buyable models.

For per-token rates and release dates across every model the site follows, see the AI model release tracker. To turn these rates into the cost of a real job, use the cost-per-task calculator or put two models head to head. To pay nothing at all, see which AI models are free to use and good enough to ship with.

Frequently asked questions

What is the best LLM for coding in 2026?: Among models you can actually buy, Claude Opus 4.8 posts the highest coding composite in this dataset, 74.3 of 100. (Claude Fable 5 scores higher at 76.5 but is suspended worldwide under a US export-control directive, so it is listed for reference only.) On a value basis, points bought per dollar, cheaper models such as MiniMax M3 lead instead, because they score within range of the frontier at a fraction of the token price.
What is the best value AI model?: On the coding composite, MiniMax M3 is the value leader at about 111.6 points per dollar of blended price, roughly 15.1 times Claude Opus 4.8. Other low-cost models (Qwen3.7 Plus, DeepSeek V4, Kimi K2.7) cluster near the top too. Value rewards low price, so it favors capable cheap models over the most expensive flagships: GPT-5.2 Pro, at $21/$168 per Mtok, lands last on value despite a top-tier score.
How is the value score calculated?: Value equals the benchmark composite score divided by a blended token price. The blended price weights input and output tokens 3 to 1: (3 times input + output) divided by 4, in dollars per million tokens. The 3:1 mix reflects agentic coding, which reads far more context than it writes. A higher value means more measured ability per dollar; it is a cost-efficiency measure, not a quality ranking on its own.
Where do the benchmark scores come from?: The composite Coding and Intelligence scores are read from the Price Per Token dataset, which aggregates independent benchmarks and cites Artificial Analysis, the HuggingFace Open LLM Leaderboard, and LayerLens. They are composite scores on a 0-100 scale, not a single named test. Prices marked with a check are reconciled with this site's own verified registry against the provider source; the rest are the author-direct or endpoint rate reported by the source dataset, labeled per row and not yet independently re-verified. This leaderboard sells no placement: ranking is never for sale.
Why are GPT-5.5 and GPT-5.6 not on the leaderboard?: They are tracked on the model registry but not yet scored. GPT-5.6 (Sol, Terra, Luna) was previewed on June 26, 2026 in a limited, US-government-gated rollout, and GPT-5.5 has no composite in the source benchmark dataset yet. A model is added here only once an independent composite score exists for it, so the ranking is not padded with vendor-reported numbers.

Sources

Price Per Token. LLM API Pricing and Benchmarks dataset (composite Coding and Intelligence scores). Scores read 2026-06-27. https://pricepertoken.com/
Artificial Analysis. Independent LLM benchmarks and intelligence index (cited by the source dataset as a benchmark origin). https://artificialanalysis.ai/
Capital & Compute. AI model registry (verified per-token API prices, each linked to a provider source). /ai-models/

Machine-readable data: /ai-model-leaderboard.json.

← Back to Capital & Compute