MiniMax

MiniMax M3

1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench.

Context1,048,576-token context — matches GLM-5.2 and Fable 5
Pricing$0.30 / 1M input tokens, $1.50 / 1M output
Tasks tested42
Avg score7.96/10 average
Medals🥇12 🥈11 🥉8
Release2026-06-18

Reference benchmarks for MiniMax M3

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for MiniMax M3 is honest about what's measured.

Context window
1,048,576 tokens
source: /openrouter
Per-token cost
$0.30 / M input · $1.50 / M output
source: /openrouter

What is MiniMax M3?

MiniMax M3 is the MiniMax frontier model with a 1,048,576-token context — matches GLM-5.2 and Fable 5 context window, released 2026-06-18. Tagline: 1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench..

Pricing detail. MiniMax M3 is the cheapest 1M-context frontier model on the bench — roughly 1/200th the per-call cost of OpenRouter Fusion and 1/30th of Claude Opus 4.8. Designed for high-volume agent workloads where context length matters but per-call budget is tight.

How I use it inside the Agent OS. Bench prompts dispatched via OpenRouter. Scored by Claude judge against the same 42 prompts every other model ran.

What I built with MiniMax M3

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what MiniMax M3 shipped on the bench: 42 one-shot demos across 1,048,576-token context — matches GLM-5.2 and Fable 5 of context. Of those, 42 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

  • 1M token context — full repo / full deep-research corpus fits in one call
  • $0.30/M input is roughly 1/30th of Opus 4.8 — built for high-volume agent loops
  • Solid one-shot HTML output — clean structure on game and visual prompts

Trade-offs

  • Less polished than Fusion's panel-ensembled output on the toughest deep builds
  • Newer model — less community calibration vs Fable 5 / Opus 4.8

Best for

  • High-volume agent workflows where per-call cost dominates
  • 1M-context tasks (whole-repo refactors, deep-research synthesis)
  • Drop-in cheaper alternative to GLM-5.2 with comparable 1M context

Every demo by MiniMax M3

42 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR