AI coding agent cost per task
The metric that decides the real bill is not the price per token or the monthly plan: it is what one finished coding task costs. Every model here is priced on the same reference job, so the ranking is the cost to complete the work, cheapest first.
How much does an AI model cost per coding task?
On a modeled multi-file coding change, the same real task run across 16 models, cost per task ranges from about $0.10 on DeepSeek V4 to $4.85 on Claude Fable 5: a 46x spread for one job. The reference task assumes 1.5 million input tokens (90% served from cache), 40,000 output tokens, and a single pass; the figure is what the per-token API rate bills to complete it, not a monthly plan price. Cheaper-listed models do not always finish cheaper, because output-heavy or long-reasoning models can cost more per task than their sticker rate suggests.
The cost-per-task ladder
Cost to complete the reference multi-file change, on a log axis because the figures span more than an order of magnitude. The cheapest open-weight models finish the job for a dime; the priciest US flagships cost roughly 46 times as much for the same task.
| Tool | Cost per task | Multiple of baseline |
|---|---|---|
| DeepSeek V4 | $0.10 | - |
| Gemini 3 Flash | $0.26 | - |
| Claude Haiku 4.5 | $0.48 | - |
| GPT-5.6 Luna | $0.52 | - |
| Kimi K2.7 Code | $0.56 | - |
| GLM-5.2 | $0.74 | - |
| Gemini 3.5 Flash | $0.79 | - |
| Qwen3.7 Max | $1.01 | - |
| Gemini 3.1 Pro | $1.05 | - |
| GPT-5.6 Terra | $1.31 | - |
| Claude Sonnet 5 | $1.46 | - |
| Claude Sonnet 4.6 | $1.46 | - |
| Claude Opus 4.8 | $2.42 | - |
| GPT-5.5 | $2.63 | - |
| GPT-5.6 Sol | $2.63 | - |
| Claude Fable 5 | $4.85 | - |
Capability against cost
Cost alone does not say whether a model can finish the work. Plotting the independent coding composite against cost per task shows the trade-off for the models that carry both an independent score and a verified price. Better value sits lower and further right: more measured ability for less money. Kimi K2.7 Code buys the most coding points per dollar here, GLM-5.2 matches Gemini 3.1 Pro on score at a lower cost, and the most capable model, Claude Opus 4.8, is also the most expensive to run.
| Item | Coding composite score (0-100) | Cost per task (USD) |
|---|---|---|
| Kimi K2.7 Code | 61 | $0.56 |
| GLM-5.2 | 69 | $0.74 |
| Qwen3.7 Max | 66 | $1.01 |
| Gemini 3.1 Pro | 69 | $1.05 |
| Claude Opus 4.8 | 74 | $2.42 |
| Claude Fable 5 | 77 | $4.85 |
Only 6 of the 16 priced models carry an independent coding composite so far. For the capability-per-dollar ranking across the full set, see the AI model value leaderboard.
How the ranking shifts by task size
The same model is cheap on a small edit and dear on a long agentic run, because cost scales with tokens. The table prices every model on all three task sizes, sorted by the multi-file reference job.
| Model | One-file edit | Multi-file change | Agentic task | Coding |
|---|---|---|---|---|
| DeepSeek V4DeepSeek | $0.02 | $0.10 | $0.21 | n/a |
| Gemini 3 FlashGoogle | $0.04 | $0.26 | $0.52 | n/a |
| Claude Haiku 4.5Anthropic | $0.08 | $0.48 | $0.97 | n/a |
| GPT-5.6 LunaOpenAI | $0.09 | $0.52 | $1.05 | n/a |
| Kimi K2.7 CodeMoonshot | $0.08 | $0.56 | $1.12 | 60.8 |
| GLM-5.2Zhipu | $0.11 | $0.74 | $1.47 | 68.8 |
| Gemini 3.5 FlashGoogle | $0.13 | $0.79 | $1.57 | n/a |
| Qwen3.7 MaxAlibaba | $0.16 | $1.01 | $2.02 | 66.0 |
| Gemini 3.1 ProGoogle | $0.18 | $1.05 | $2.10 | 68.8 |
| GPT-5.6 TerraOpenAI | $0.22 | $1.31 | $2.63 | n/a |
| Claude Sonnet 5Anthropic | $0.25 | $1.46 | $2.91 | n/a |
| Claude Sonnet 4.6Anthropic | $0.25 | $1.46 | $2.91 | n/a |
| Claude Opus 4.8Anthropic | $0.41 | $2.42 | $4.85 | 74.3 |
| GPT-5.5OpenAI | $0.45 | $2.63 | $5.25 | n/a |
| GPT-5.6 SolOpenAI | $0.45 | $2.63 | $5.25 | n/a |
| Claude Fable 5Anthropic | $0.82 | $4.85 | $9.70 | 76.5 |
Costs are modeled from verified per-token rates on the three task profiles defined below. Coding is the independent composite score (0-100) where one exists; a dash means the model is not yet scored.
What the reference task is
Every cost figure prices the same job, so the ranking compares models rather than tasks. The headline number uses the multi-file change profile: 1.5 million input tokens with 90% served from the prompt cache, 40,000 output tokens, and a single pass. That is the modeled cost case that lands Claude Sonnet 5 at about $1.46, the worked example behind what an AI coding agent costs per task. The table also prices a smaller one-file edit and a longer multi-step agentic session, so you can see the ranking move as the work grows.
How the number is computed
The cost of a task is the sum of three lines, each at the model's verified per-token rate: cache reads (the repeated context an agent re-reads each turn, billed at roughly a tenth of fresh input), fresh input, and output. Every per-token rate is read from the provider's official API pricing page and dated in the AI model release tracker. To change the assumptions, model your own task in the cost-per-task calculator, or put two models head to head with the model comparison.
The cheapest sticker is often not the cheapest task. A model with a low per-token rate can still cost more to finish a job when it generates more output or burns more reasoning tokens, the price reversal a 2026 Microsoft Research preprint found in 32% of model pairs. The mechanism, and what to measure instead, is the subject of the price reversal phenomenon.
Where Artificial Analysis fits
The cost-per-task framing follows the Artificial Analysis Coding Agents leaderboard, an independent benchmark (v1.1 as of mid-2026) that reports a Coding Agent Index, time per task, and cost per task for each agent. Artificial Analysis measures cost empirically: the mean pay-per-token API cost to run an agent across a composite of agentic coding benchmarks (DeepSWE, Terminal-Bench v2, and SWE-Atlas-QnA), counting standard input, cached input, cache writes, and output. Those are their measured numbers, not reproduced here.
The figures on this page are modeled instead: published per-token rates applied to a transparent, fixed task profile. That is a deliberate trade-off. A model is reproducible and comparable across every model on the identical job, but it is an estimate of token consumption, not a measurement of it. Treat the result as the order of magnitude and the ranking, not your exact invoice, and use Artificial Analysis for the empirical view of how agents actually consume tokens on real benchmark tasks.
Frequently asked questions
- How much does an AI model cost per coding task?
- On a modeled multi-file change (1.5 million input tokens, 90% cached, 40,000 output tokens, one pass), cost per task runs from about $0.10 on DeepSeek V4 to $4.85 on Claude Fable 5. Most real coding tasks land between roughly $0.10 and $5, set by how many tokens the agent burns rather than the monthly subscription price.
- Which AI model is cheapest per coding task?
- On this reference task, DeepSeek V4 is cheapest to finish at about $0.10, followed by Gemini 3 Flash and Claude Haiku 4.5. The ranking shifts with task shape: an output-heavy or long-reasoning job rewards different models, which is why cost per task, not the per-token sticker, is the number to compare.
- What is cost per task and how is it measured?
- Cost per task is the API spend to complete one unit of real work, not a price per token or per month. Artificial Analysis measures it empirically as the mean pay-per-token cost across an agentic coding benchmark. The figures here are modeled instead: published per-token rates applied to a transparent, stated task profile, so they are reproducible and comparable across models on the same job.
- Does the cheapest model actually finish the task?
- Not always, and that is the trade-off this table does not settle on its own. DeepSeek V4 is cheapest per task but is not the most capable; on the independent coding composite, the priciest model here, Claude Fable 5, scores highest. For the capability-per-dollar ranking across more models, see the value leaderboard.
Sources
- Artificial Analysis (2026). Coding Agents leaderboard (Coding Agent Index, time per task, and cost per task; methodology v1.1). Independent benchmark. artificialanalysis.ai/agents/coding-agents
- Bai, L., et al. (2026). How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks. arXiv preprint arXiv:2604.22750. arxiv.org/abs/2604.22750
- Capital & Compute. AI model registry (verified per-token API prices, each linked to a provider source). /ai-models/
- Capital & Compute. AI model value leaderboard (independent coding composite scores). /ai-model-leaderboard/