What's the prompt for the Arcade test?

Arcade — classic arcade-style game (pick: tetris, breakout, snake). Every model receives this exact prompt, one shot, single HTML file out.

Game

Arcade

Q: What's the best AI model for Arcade?

GPT-5.6 Sol — Gorgeous, fully-rendered neon breakout with rainbow brick grid, glowing paddle/ball, retro perspective grid floor, and clean HUD/controls/pause overlay — strong arcade identity backed by solid physics, DPR scaling, particles and audio. Polish and cohesion put it at the top of the field.

Q: How many AI models attempted Arcade?

24 models on Goldie Bench have attempted Arcade: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Arcade — classic arcade-style game (pick: tetris, breakout, snake).

CategoryGame

Models tested24

Scored19/24

Avg score7.85/10

WinnerGPT-5.6 Sol

What I asked each model — the Arcade prompt

Every model on this page got this exact prompt inside the Agent Operating System: Arcade — classic arcade-style game (pick: tetris, breakout, snake).

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 24 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Why this task matters. Arcade is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Arcade

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

• 8.3/10

What I saw: Clean, polished neon breakout with glowing rainbow bricks, gradient paddle, twinkling starfield, particle effects, autopilot demo, multi-input controls and level progression — strong and shippable. Slightly generic layout and the ball missed rendering its shadow badge in this frame keep it just under the top tier.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

• 7.0/10

What I saw: Ultra v2 — Breakout-style. Smoke-test MAYBE (0.4% diff) — runs but generic click+WASD didn't trigger strong motion; paddle may need mouse-move.

▶ Play Fugu Ultra's attempt →

Fugu Ultra 1.1 Sakana AI

• 5.5/10

What I saw: The brief asked for a classic arcade game (tetris/breakout/snake); this is a 3D flying-shooter titled 'Breakout Hunter' with drones and radar — polished HUD and clean low-poly visuals, but it completely misses the arcade brief and there's no actual breakout/paddle/brick gameplay visible despite the label. Strong presentation, wrong genre.

▶ Play Fugu Ultra 1.1's attempt →

Fugu Mini Sakana AI

🥉 8.5/10

What I saw: 55KB Breakout-style — paddle, ball, brick wall, particles, score HUD. Smoke-test PASS (large pixel diff after input).

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

🥉 8.5/10

What I saw: Neon Breakout with combo system, lives-dot indicators, floating-card overlay animation, full pill HUD. Custom cursor, mobile pointer support, particle juice. Tied with Opus on game-feel.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 6.5/10

What I saw: Strong polished HUD, clean 3D player craft with visible enemy and combat systems (projectiles, HP damage taken at 88), but it ignored the brief entirely — this is a twin-stick shooter, not a classic arcade pick (tetris/breakout/snake), and the scene reads sparse with only one enemy visible.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 8.0/10

What I saw: All three shipped a genuinely juicy game. Opus's breakout had the most game-feel — particle bursts and a live combo. Kimi's breakout was clean and solid. GLM went its own way with fullscreen neon asteroids. The closest of the practical five.

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

🥇 8.6/10 · neon breakout polish

What I saw: Gorgeous, fully-rendered neon breakout with rainbow brick grid, glowing paddle/ball, retro perspective grid floor, and clean HUD/controls/pause overlay — strong arcade identity backed by solid physics, DPR scaling, particles and audio. Polish and cohesion put it at the top of the field.

▶ Play GPT-5.6 Sol's attempt →

Grok xAI

• 8.0/10

What I saw: A crisp neon Asteroids — glowing ship, lives, score, drifting rocks, real sound. Shippable straight out of one prompt.

▶ Play Grok's attempt →

Inkling Thinking Machines

• 8.2/10

What I saw: A polished 3D Breakout in Three.js with a gorgeous gradient title, glowing rainbow brick wall, paddle/ball follow, trail dots and live score badge — clearly renders and is on-brief. Held back from top spot by the loose 2D collision math on a 3D perspective view (paddle bounce/wall bounds are hardcoded and can feel off) and lack of lives/win state, but visually it's a strong, shippable entry.

▶ Play Inkling's attempt →

The winner on Arcade

GPT-5.6 Sol took gold on this task. neon breakout polish.

See GPT-5.6 Sol's full model card: /models/gpt56.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

Arcade

What I asked each model — the Arcade prompt

How each model handled Arcade

The winner on Arcade

Every attempt — live, playable

How I scored Arcade — methodology

Related

Run this stack yourself.