MiniMax M3
1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench.
Reference benchmarks for MiniMax M3
These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for MiniMax M3 is honest about what's measured.
What is MiniMax M3?
MiniMax M3 is the MiniMax frontier model with a 1,048,576-token context — matches GLM-5.2 and Fable 5 context window, released 2026-06-18. Tagline: 1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench..
Pricing detail. MiniMax M3 is the cheapest 1M-context frontier model on the bench — roughly 1/200th the per-call cost of OpenRouter Fusion and 1/30th of Claude Opus 4.8. Designed for high-volume agent workloads where context length matters but per-call budget is tight.
How I use it inside the Agent OS. Bench prompts dispatched via OpenRouter. Scored by Claude judge against the same 42 prompts every other model ran.
What I built with MiniMax M3
Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what MiniMax M3 shipped on the bench: 42 one-shot demos across 1,048,576-token context — matches GLM-5.2 and Fable 5 of context. Of those, 42 are scored against the field with my honest 0–10 from the source guides at agentos.guide.
Strengths
- 1M token context — full repo / full deep-research corpus fits in one call
- $0.30/M input is roughly 1/30th of Opus 4.8 — built for high-volume agent loops
- Solid one-shot HTML output — clean structure on game and visual prompts
Trade-offs
- Less polished than Fusion's panel-ensembled output on the toughest deep builds
- Newer model — less community calibration vs Fable 5 / Opus 4.8
Best for
- High-volume agent workflows where per-call cost dominates
- 1M-context tasks (whole-repo refactors, deep-research synthesis)
- Drop-in cheaper alternative to GLM-5.2 with comparable 1M context
Every demo by MiniMax M3
42 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVE
▶ LIVECompare MiniMax M3 against every other model
Every head-to-head featuring MiniMax M3. Verdicts shown for scored pairs.
See all 66 comparisons across every model →
Quick pill index
Direct comparisons against every other scored model on the bench:
MiniMax M3 vs Opus 4.8 MiniMax M3 vs GLM-5.2 MiniMax M3 vs Grok MiniMax M3 vs Fusion MiniMax M3 vs Fugu Ultra MiniMax M3 vs Qwen 3.7 MiniMax M3 vs Kimi K2.7 MiniMax M3 vs Fugu Mini MiniMax M3 vs Gemma-4 12B CoderRead more on agentos.guide:
Run this stack yourself.
Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.