Skip to content
Capital & Compute
· ai· pricing· economics· coding-agents

MAI-Code-1-Flash Cost Per Task: Cheapest Coding Model?

Microsoft calls MAI-Code-1-Flash its cheapest coding model. In Copilot it bills like Claude Haiku 4.5 (0.33x); its token edge holds only on easy benchmarks.

By Capital & Compute

Microsoft is selling MAI-Code-1-Flash as the cheapest way to do agentic coding. It is the company’s first in-house coding model, built for GitHub Copilot, and the launch pitch is two numbers welded together: up to 60% fewer tokens than Claude Haiku 4.5, and a win on every coding benchmark Microsoft ran. Ask the narrower question this site cares about, what one task actually costs, and the weld cracks.

Inside Copilot, the only place you can run the model today, MAI-Code-1-Flash bills at exactly the same rate as Haiku 4.5. Its token-efficiency edge is large on one benchmark and close to nothing on the hardest one. And the benchmark win it leads with rests on Microsoft’s own re-measurement of Haiku, not Anthropic’s published score. So is it the cheapest coding model? It depends entirely on how you are billed, and for the way it actually ships, the answer is no.

Is MAI-Code-1-Flash actually the cheapest coding model?

For the typical GitHub Copilot user, no. MAI-Code-1-Flash and Claude Haiku 4.5 both bill at a 0.33x premium-request multiplier, so picking one over the other saves nothing per request. The only cost advantage is token efficiency, which is roughly 60% on SWE-Bench Verified but about 6% on harder tasks, and there is no public API to bill against.

That is the whole post in four sentences, but the reason the marketing works is worth seeing in detail, because Microsoft has picked the one cost unit that flatters the model and quietly dropped the two that don’t.

What MAI-Code-1-Flash is

The facts that matter come straight from Microsoft’s MAI-Code-1-Flash model card, the authoritative source. It is a sparse Mixture-of-Experts transformer with 137B total parameters and 5B active, a 256K-token context window, trained between March and May 2026 on data with a December 2025 cutoff, and released on June 2, 2026.

The 137B-versus-5B confusion that ran through the early coverage is not a contradiction. Both numbers are correct. 137B is the full parameter pool; 5B is the slice that actually fires on any given token. That is what makes it cheap to serve, and it is the same architecture pattern most of the current cheap-agent field uses.

What it is not, yet, is a model you can use anywhere. The card is blunt about this: “Available only in GitHub Copilot in Visual Studio Code at launch,” with availability since expanded across more Copilot surfaces such as the CLI, JetBrains, and Xcode. Any future API release, the card notes, “would be accompanied by an update to the relevant documentation.” There is no such update. No API, no third-party hosting, no way to run the weights yourself. That single constraint shapes everything about its cost.

The three ways to price it, and why only one applies today

A coding model can be priced in three units, and they don’t agree. Microsoft leads with the third.

Premium requests. This is how Copilot actually charges you, and it is the only unit that applies to the shipped product. GitHub’s model multiplier table lists MAI-Code-1-Flash at a 0.33x multiplier, with a footnote that “the multiplier for MAI-Code-1-Flash is a promotional rate.” Claude Haiku 4.5 sits at the same 0.33x, with no promotional caveat. So inside Copilot, the two models cost identical fractions of a premium request, and the only one whose price is flagged to move is Microsoft’s. If you want the mechanics of how premium requests translate into a monthly bill, that lives in the GitHub Copilot pricing breakdown.

Per token. Secondary coverage and a GitHub-listed figure put the per-token rate at $0.75 per million input tokens, $0.075 cached, and $4.50 output. Take those as preliminary: the official model card still says pricing is “To be finalized,” and there is no metered API to charge them against. For reference, this site carries Claude Haiku 4.5 at $1.00/$5.00 per million tokens across its cost-per-task coverage. If the reported MAI rates hold, that is roughly 25% cheaper input and 10% cheaper output per token. Modest, not the order-of-magnitude cut you see from Cursor’s Composer 2.5 at about $0.07 per task.

Tokens per task. This is the unit Microsoft actually means when it says “efficient,” and it is the only place the model has a real edge. It is also the one number the marketing reports honestly, then over-generalizes. We will spend the most time here, because it is where the cost story lives.

The reason only the first unit matters today is the no-API constraint. You cannot pay per token for a model with no token meter. You pay in premium requests, and there MAI ties Haiku. The dollar question, the one the rest of the brand answers with cost to finish the work rather than cost per token, simply has no independent answer yet. More on that below.

Tokens per task: where the efficiency is real, and where it evaporates

Microsoft’s model card reports average token usage per completed task for both models, run in the same harness. That is the right measurement, and it tells a sharper story than the headline does.

Tokens per task: MAI-Code-1-Flash vs Claude Haiku 4.5Dumbbell chart of average tokens per completed task in thousands. SWE-Bench Verified: MAI 10.8K vs Haiku 27.3K, a 60% gap. Terminal Bench 2: MAI 21.6K vs Haiku 25.0K, 14% gap. SWE-Bench Multilingual: MAI 15.3K vs Haiku 17.2K, 11% gap. SWE-Bench Pro: MAI 28.0K vs Haiku 29.8K, 6% gap. The efficiency advantage is large only on SWE-Bench Verified.Claude Haiku 4.5MAI-Code-1-Flash0K10K20K30KSWE-Bench Verified27.3K10.8KTerminal Bench 225K21.6KSWE-Bench Multilingual17.2K15.3KSWE-Bench Pro29.8K28K
Tokens per task: MAI-Code-1-Flash vs Claude Haiku 4.5
ItemClaude Haiku 4.5MAI-Code-1-Flash
SWE-Bench Verified27.3K10.8K
Terminal Bench 225K21.6K
SWE-Bench Multilingual17.2K15.3K
SWE-Bench Pro29.8K28K
Average tokens to finish one task, MAI-Code-1-Flash vs Claude Haiku 4.5, across the four coding benchmarks on Microsoft's model card. The gap is the efficiency claim. It is enormous on SWE-Bench Verified (the source of the '60% fewer tokens' line) and shrinks to almost nothing on the harder, more realistic SWE-Bench Pro. Lower is cheaper.Source: Microsoft, MAI-Code-1-Flash model card, June 2026

On SWE-Bench Verified, MAI-Code-1-Flash finishes the average task in 10.8K tokens against Haiku’s 27.3K. That is the 60% reduction the launch leads with, and it is genuine. But SWE-Bench Verified is the most saturated, most-trained-on coding benchmark in the set. Move to SWE-Bench Pro, the harder and more representative test of real repository work, and the two models spend almost the same: 28.0K tokens versus 29.8K, a 6% gap. SWE-Bench Multilingual lands at 11% and Terminal Bench 2 at 14%.

The pattern is the point. The efficiency advantage is inversely proportional to how hard the task is. On the easy benchmark it is huge; on the kind of work you would actually hire a coding agent for, it is a rounding error. “Up to 60% fewer tokens” is true the way “up to 70% off” is true at a sale where one item is marked down.

The benchmark caveat: whose Haiku score?

The efficiency story at least uses honest numbers. The capability story has a problem hiding in a footnote.

Microsoft’s headline is that MAI-Code-1-Flash beats Claude Haiku 4.5 on every coding benchmark, including 71.6 versus 66.6 on SWE-Bench Verified. The model card footnotes the Haiku column: “Numbers from internal benchmark system with production harness.” Those Haiku scores are Microsoft’s own re-measurement of Anthropic’s model inside Microsoft’s harness, not the numbers Anthropic published.

It matters, because Anthropic reports Claude Haiku 4.5 at 73.3% on SWE-Bench Verified, averaged over 50 trials. Put the three numbers side by side:

SWE-Bench Verified Pass rate
Claude Haiku 4.5 (Anthropic, published) 73.3
MAI-Code-1-Flash (Microsoft) 71.6
Claude Haiku 4.5 (Microsoft’s re-measurement) 66.6

Against the score Anthropic actually publishes, MAI-Code-1-Flash trails Haiku on the benchmark Microsoft chose to lead with. The “win” exists only because Microsoft’s harness scores Haiku nearly seven points lower than Anthropic does. To be fair to Microsoft, running every model in one harness is the defensible way to compare, and the card says so. But the company then took an in-harness result and marketed it as a flat “beats Haiku 4.5,” without noting that Haiku’s published number is higher than its own model’s.

There is a second sleight worth naming, and it is the analytical core of the whole launch. Look at where MAI is genuinely, decisively better:

Benchmark MAI pass Haiku pass MAI lead MAI token saving
SWE-Bench Verified 71.6 66.6 +5.0 ~60%
SWE-Bench Pro 51.2 35.2 +16.0 ~6%
SWE-Bench Multilingual 65.5 62.7 +2.8 ~11%
Terminal Bench 2 54.8 41.6 +13.2 ~14%

The capability wins and the efficiency wins are anti-correlated. Where MAI-Code-1-Flash is clearly the stronger model (SWE-Bench Pro, +16 points; Terminal Bench 2, +13 points), it is barely more efficient (6% and 14% fewer tokens). Where it is dramatically more efficient (SWE-Bench Verified, 60% fewer tokens), it is barely ahead, and behind once you use Haiku’s published score. The marketing fuses “+16 points” from one benchmark with “60% fewer tokens” from another into a single impression of a model that is both far better and far cheaper. No single workload delivers both at once.

So what does a task actually cost?

Here is the honest answer the title promises, in the units that actually exist.

If you run Copilot, a MAI-Code-1-Flash task costs the same as a Haiku 4.5 task: 0.33 of a premium request, with MAI’s rate the one flagged to rise. The model picker choice between them is a capability and latency decision, not a cost decision. On the harder agentic work that eats most of a coding budget, even the token-efficiency tiebreaker mostly disappears.

The dollar-per-task figure that this site usually publishes, the one Artificial Analysis measures by running each model in the harness it ships in, does not exist for MAI-Code-1-Flash. There is no entry on Artificial Analysis. Independent trackers list it as “API access coming soon” under a proprietary license. With no API and no third-party access, no one outside Microsoft can measure what it costs to finish a representative task in dollars. That absence is itself the finding: a model marketed on cost that cannot yet be independently priced.

0.33x
premium-request multiplier
same as Haiku 4.5; promotional
60% / 6%
token saving
SWE-Bench Verified vs SWE-Bench Pro
71.6 vs 73.3
SWE-Bench Verified
MAI vs Haiku's published score
No API
Copilot-only
pricing 'to be finalized'

How it fits the rest of the cheap-agent field

Strip away the marketing and MAI-Code-1-Flash is a competent small coding model that does one useful thing well: it spends fewer tokens on easy tasks. That is worth something. It is not a price revolution.

The genuinely cheap options in the 2026 coding-agent field win by a different mechanism. Cursor’s Composer 2.5 finishes a task for roughly a tenth to a sixtieth of a frontier model’s cost by post-training an open checkpoint and metering against a subscription. Gemini 3.5 Flash undercuts on a published per-token rate you can actually pay. MAI-Code-1-Flash has neither lever: its in-Copilot price ties Haiku, and its per-token rate is unconfirmed and unusable. Developers on the Hacker News thread made the same point from the other direction, noting that Microsoft compared only against Anthropic’s small model and skipped the open-weight competitors that beat Haiku in real coding for a fraction of the subscription cost.

This is the same trap the brand keeps documenting: a low sticker number that the real bill drifts above once you count consumption and tier. It is the price-reversal pattern where cheaper-listed models cost more to finish the work. MAI-Code-1-Flash does not reverse on Haiku, it ties it, but the headline-to-reality drift is the same mechanism: a single flattering number standing in for a cost that, measured honestly, is unremarkable.

A model marketed on cost that cannot yet be independently priced. The absence is the finding.

Who should use it

Use MAI-Code-1-Flash if you already live in Copilot and your work skews toward shorter, well-scoped tasks where the token savings actually show up, and where 0.33x is your budget either way. For that profile it is a reasonable default, and the fact that Microsoft trained it directly on the Copilot harness means it behaves well inside the loop it was built for.

Look elsewhere if cost is the actual constraint. If you can run anything, the genuinely cheap per-task options cost a fraction of a premium request’s worth of tokens and let you pay per token if you want to. And if your work is the hard, multi-file agentic kind, the SWE-Bench Pro row is the one to read: MAI is more capable than Haiku there, but its efficiency edge is gone, so you are choosing on capability, not cost. Watch two things before committing: whether the 0.33x promotional rate survives the rollout to Business and Enterprise, and whether an API ever ships to make independent cost-to-run measurable. Until then, the model lives where Microsoft can control how it is priced and how it is compared.

Frequently asked questions

Is MAI-Code-1-Flash cheaper than Claude Haiku 4.5?
Not inside GitHub Copilot. Both models bill at a 0.33x premium-request multiplier, so a task costs the same on either, and only MAI-Code-1-Flash's rate is flagged as promotional. Its only cost edge is token efficiency, which is about 60% fewer tokens on SWE-Bench Verified but only about 6% on harder tasks like SWE-Bench Pro.
Can I use MAI-Code-1-Flash through an API or outside GitHub Copilot?
No. As of June 2026 it is available only inside GitHub Copilot, across surfaces such as VS Code, the CLI, JetBrains, and Xcode. The official model card says any future API release would be documented, and no such release exists. There is no public API and no third-party hosting.
How much does MAI-Code-1-Flash cost per token?
Secondary coverage and a GitHub-listed figure report $0.75 per million input tokens, $0.075 cached, and $4.50 output. Treat these as preliminary: the official Microsoft model card still lists pricing as "to be finalized," and there is no metered API to charge them against today.
Does MAI-Code-1-Flash really beat Claude Haiku 4.5 on benchmarks?
It depends whose Haiku score you use. Microsoft reports 71.6 vs 66.6 on SWE-Bench Verified, but the 66.6 is Microsoft's own re-measurement of Haiku in its harness. Anthropic publishes Haiku 4.5 at 73.3, higher than MAI-Code-1-Flash. On SWE-Bench Pro, MAI does lead clearly, by 16 points.
How many parameters does MAI-Code-1-Flash have?
It is a sparse Mixture-of-Experts model with 137B total parameters and 5B active per token, with a 256K-token context window. The two parameter figures are not a contradiction: 137B is the full pool and 5B is the slice that fires on any given token.

Sources

Subscribe to Capital & Compute

Source-backed analysis of what AI compute really costs, sent when a new post goes live.

No spam. Unsubscribe anytime.

← Back to all posts