MiniMax

MiniMax M3

1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench.

Context1,048,576-token context — matches GLM-5.2 and Fable 5

Pricing$0.30 / 1M input tokens, $1.50 / 1M output

Tasks tested42

Avg score7.96/10 average

Medals🥇12 🥈11 🥉8

Release2026-06-18

Reference benchmarks for MiniMax M3

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for MiniMax M3 is honest about what's measured.

Context window

1,048,576 tokens

source: /openrouter

Per-token cost

$0.30 / M input · $1.50 / M output

source: /openrouter

What is MiniMax M3?

MiniMax M3 is the MiniMax frontier model with a 1,048,576-token context — matches GLM-5.2 and Fable 5 context window, released 2026-06-18. Tagline: 1M-context frontier model at $0.30/M tokens — cheapest big-context model on the bench..

Pricing detail. MiniMax M3 is the cheapest 1M-context frontier model on the bench — roughly 1/200th the per-call cost of OpenRouter Fusion and 1/30th of Claude Opus 4.8. Designed for high-volume agent workloads where context length matters but per-call budget is tight.

How I use it inside the Agent OS. Bench prompts dispatched via OpenRouter. Scored by Claude judge against the same 42 prompts every other model ran.

What I built with MiniMax M3

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what MiniMax M3 shipped on the bench: 42 one-shot demos across 1,048,576-token context — matches GLM-5.2 and Fable 5 of context. Of those, 42 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

1M token context — full repo / full deep-research corpus fits in one call
$0.30/M input is roughly 1/30th of Opus 4.8 — built for high-volume agent loops
Solid one-shot HTML output — clean structure on game and visual prompts

Trade-offs

Less polished than Fusion's panel-ensembled output on the toughest deep builds
Newer model — less community calibration vs Fable 5 / Opus 4.8

Best for

High-volume agent workflows where per-call cost dominates
1M-context tasks (whole-repo refactors, deep-research synthesis)
Drop-in cheaper alternative to GLM-5.2 with comparable 1M context

Every demo by MiniMax M3

42 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

Neon Breakout — paddle, ball, brick wall, particle trails, score HUD.

Nordic dungeon crawler on three.js — torch-lit corridors, skeletons.

25KB 3D dogfight with enemy AI, missiles, guns.

Raycaster + sprite enemies + gun + HUD. 21KB of game logic.

Dragonflight 🥇

Fly a dragon through neon rings — full HUD, score, fire-breath gauge.

Dragonrealm 🥇

34KB frozen open world — snowy mountains, pines, flying dragon, full HUD.

M3 chose an arcade build with HUD + score + lives + sound effects.

Neonblaster 🥇

30KB neon space shooter — waves, bosses, power-ups, screen-shake.

Cyberpunk flythrough with neon towers + light trails.

Top-down neon racer with vapor trails + drift physics + lap timer.

Nordiccrypt 🥇

41KB Nordic crypt with torch-lit corridors, chasing skeletons, boss room.

Pseudo-3D OutRun racer.

Canvas-2D billiards with 16 balls, pockets, click-drag aim.

59KB third-person arcade racer. Banking turns, speed boost, drift, lap timer.

Canvas-2D raycaster — WASD walking, textured walls, distance fog.

29KB top-down RPG with tilemap, NPCs, combat, inventory.

43KB Skyrim attempt on three.js — snowy terrain, pines, dragon, HUD.

Twilightvale 🥈

47KB — densest open-world. Village, NPCs, combat, day/night, weather, inventory.

Voxelcraft 🥈

27KB Minecraft-style sandbox — break/place blocks, hotbar, day/night cycle.

Nova-1 landing with animated gradient hero, three feature cards, footer CTA.

38KB working desktop — wallpaper, dock, draggable Notes/Paint/Terminal/Calculator windows.

Minimal gravitational-lens shader.

Boids flocking with separation/alignment/cohesion + mouse repulsion.

Verlet cloth sim, draggable, pinnable corners. 17KB clean implementation.

2D fluid sim with click-drag injection.

WebGL Mandelbrot shader with click-to-zoom, hold-to-continuous-zoom.

Spiral galaxy on three.js — particle stars, slow rotation.

44KB top-down orbit map — Mercury through Mars with accurate relative speeds, hover info cards.

Particleforge 🥈

Particle sculptor with mouse gravity + preset modes + FPS counter.

Pathtracer 🥇

62KB WebGL shader path tracer with sample accumulation.

Reactiondiff 🥇

31KB Gray-Scott shader with click-to-seed.

M3's solar — three.js scene with sun + planets + orbits.

3D tunnel flythrough with distorted starfield.

Aurora ribbons over mountain silhouette.

Click-to-launch fireworks with particle trails.

Metaball lava-lamp shader with warm gradient.

Classic Matrix rain — falling green glyphs.

Minimal 1KB plasma. Brief is barely met.

Neon grid + scanline sun synthwave flythrough.

Tron-style terrain flythrough.

29KB Temple-Run-style voxel runner on three.js — lane switching, jump + slide, coins.

Gerstner-wave ocean with sun reflection.

Compare MiniMax M3 against every other model

Every head-to-head featuring MiniMax M3. Verdicts shown for scored pairs.

MiniMax M3 vs Opus 4.8

Opus 4.8 leads 11–1

MiniMax M3 vs GLM-5.2

GLM-5.2 leads 8–3

MiniMax M3 vs Grok

Grok leads 19–6

MiniMax M3 vs Fusion

Fusion leads 26–8

MiniMax M3 vs Fugu Ultra

Fugu Ultra leads 3–1

MiniMax M3 vs Qwen 3.7

MiniMax M3 leads 3–0

MiniMax M3 vs Kimi K2.7

MiniMax M3 leads 10–7

MiniMax M3 vs Fugu Mini

MiniMax M3 vs Gemma-4 12B Coder

MiniMax M3 leads 6–0

MiniMax M3 vs Kimi K2.7 · Fast

3 shared tasks · unscored

MiniMax M3 vs Kimi K2.7 · No-Think

3 shared tasks · unscored

MiniMax M3 vs Kimi K2.7 · Quality

3 shared tasks · unscored

MiniMax M3 vs Claude Fable 5

MiniMax M3 vs Claude Mythos 5

MiniMax M3 vs Kilo Code

See all 66 comparisons across every model →

Quick pill index

Direct comparisons against every other scored model on the bench:

MiniMax M3 vs Opus 4.8 MiniMax M3 vs GLM-5.2 MiniMax M3 vs Grok MiniMax M3 vs Fusion MiniMax M3 vs Fugu Ultra MiniMax M3 vs Qwen 3.7 MiniMax M3 vs Kimi K2.7 MiniMax M3 vs Fugu Mini MiniMax M3 vs Gemma-4 12B Coder

Read more on agentos.guide:

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$100k+/mocommunity MRR

Join AIPB · $59/mo → Read the Agent OS guides →