What's the prompt for the Dragonflight test?

Dragon Flight — fly a dragon through neon rings at speed, full HUD, fire-breath, fury meter. Every model receives this exact prompt, one shot, single HTML file out.

Game

Dragonflight

Q: What's the best AI model for Dragonflight?

Fusion — RETRY @ 24K tokens — now complete: 27KB three.js + WebGL with rAF + 3 input handlers + closed tags. Fly a dragon through neon rings, score + fire-breath gauge + fury meter HUD. The original truncated attempt has been replaced.

Q: How many AI models attempted Dragonflight?

25 models on Goldie Bench have attempted Dragonflight: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K3, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Dragon Flight — fly a dragon through neon rings at speed, full HUD, fire-breath, fury meter.

CategoryGame

Models tested25

Scored19/25

Avg score7.57/10

WinnerFusion

What I asked each model — the Dragonflight prompt

Every model on this page got this exact prompt inside the Agent Operating System: Dragon Flight — fly a dragon through neon rings at speed, full HUD, fire-breath, fury meter.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 25 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K3, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Why this task matters. Dragonflight is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Dragonflight

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

• 7.4/10

What I saw: Renders cleanly with a polished neon HUD (speed/score/rings/fury), a large glowing ring, floating decor and a green winged dragon — all brief elements present. Weak spot is the camera framing: the dragon is oddly close and awkwardly angled with wings dominating the frame, giving a less-controlled, generic feel versus the field's best.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

🥉 8.5/10

What I saw: Ultra v2 — dragon through neon rings, full HUD. Smoke-test PASS (4.2% pixel diff).

▶ Play Fugu Ultra's attempt →

Fugu Ultra 1.1 Sakana AI

• 6.5/10

What I saw: Full polished HUD (health/fury/alt/speed, score, rings, wyverns) and neon rings plus a visible enemy wyvern are present, but the dragon model renders as a jumbled red-blob mess with scattered floating parts, and the scene reads as cluttered/broken rather than a clean flight; combat and fire-breath appear plausible in source but the visual execution undercuts it.

▶ Play Fugu Ultra 1.1's attempt →

Fugu Mini Sakana AI

• 8.0/10

What I saw: Dragon flight through neon rings with HUD. Smoke-test PASS.

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

🥇 9.0/10 · winner · flight game

What I saw: RETRY @ 24K tokens — now complete: 27KB three.js + WebGL with rAF + 3 input handlers + closed tags. Fly a dragon through neon rings, score + fire-breath gauge + fury meter HUD. The original truncated attempt has been replaced.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 6.3/10

What I saw: Polished HUD with vitals, fury, score/rings/wyverns and clean neon aesthetic, plus a decent 3D dragon and visible enemy/rings in-scene; however the world reads dim and empty with barely-visible rings and no combat action shown, keeping it below the strong bar.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 7.5/10

What I saw: 57KB · plays clean · plain

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

• 8.4/10 · neon dragon HUD

What I saw: Renders a well-modeled segmented dragon with wings, glowing neon rings, cityscape depth and a clean full HUD (score/rings/velocity/combo, fury core meter, message banner, controls) that nails the brief; slightly generic minimalist environment and no visible fire-breath in this frame keep it just under the top-tier winners.

▶ Play GPT-5.6 Sol's attempt →

Grok xAI

🥉 8.5/10

What I saw: Fly a dragon through neon rings with score + fire-breath + fury HUD. 28KB.

▶ Play Grok's attempt →

Inkling Thinking Machines

• 6.3/10

What I saw: Strong HUD styling — title, Speed/Rings stats, and a polished fury meter all render crisply — but the screenshot shows a nearly empty scene with no visible dragon and only one distant ring dominating the frame, so the core flight gameplay reads as broken/off-camera at this moment. Great chrome, weak visual payoff.

▶ Play Inkling's attempt →

The winner on Dragonflight

Fusion took gold on this task. winner · flight game.

See Fusion's full model card: /models/fusion. Direct head-to-head against the runner-up: Fusion vs Hermes MoA.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

▶ LIVE

Claude Fable 5

Anthropic

Renders cleanly with a polished neon HUD (speed/score/rings/fury), a large glowing ring, floating decor and a green winged dragon — all brief elements present. Weak spot is the camera framing: the dragon is oddly close and awkwardly angled with wings dominating the frame, giving a less-controlled, generic feel versus the field's best.

▶ LIVE

Fugu Ultra 🥉

Sakana AI

Ultra v2 — dragon through neon rings, full HUD. Smoke-test PASS (4.2% pixel diff).

▶ LIVE

Fugu Ultra 1.1

Sakana AI

Full polished HUD (health/fury/alt/speed, score, rings, wyverns) and neon rings plus a visible enemy wyvern are present, but the dragon model renders as a jumbled red-blob mess with scattered floating parts, and the scene reads as cluttered/broken rather than a clean flight; combat and fire-breath appear plausible in source but the visual execution undercuts it.

▶ LIVE

Fugu Mini

Sakana AI

Dragon flight through neon rings with HUD. Smoke-test PASS.

▶ LIVE

Fusion 🥇

OpenRouter

RETRY @ 24K tokens — now complete: 27KB three.js + WebGL with rAF + 3 input handlers + closed tags. Fly a dragon through neon rings, score + fire-breath gauge + fury meter HUD. The original truncated attempt has been replaced.

▶ LIVE

Gemini 3.6 Flash

Google

Polished HUD with vitals, fury, score/rings/wyverns and clean neon aesthetic, plus a decent 3D dragon and visible enemy/rings in-scene; however the world reads dim and empty with barely-visible rings and no combat action shown, keeping it below the strong bar.

▶ LIVE

GLM-5.2

Zhipu / Z.ai

57KB · plays clean · plain

▶ LIVE

GPT-5.6 Sol

OpenAI

Renders a well-modeled segmented dragon with wings, glowing neon rings, cityscape depth and a clean full HUD (score/rings/velocity/combo, fury core meter, message banner, controls) that nails the brief; slightly generic minimalist environment and no visible fire-breath in this frame keep it just under the top-tier winners.

▶ LIVE

Grok 🥉

xAI

Fly a dragon through neon rings with score + fire-breath + fury HUD. 28KB.

▶ LIVE

Inkling

Thinking Machines

Strong HUD styling — title, Speed/Rings stats, and a polished fury meter all render crisply — but the screenshot shows a nearly empty scene with no visible dragon and only one distant ring dominating the frame, so the core flight gameplay reads as broken/off-camera at this moment. Great chrome, weak visual payoff.

▶ LIVE

Kimi K3

Moonshot AI

Strong 3D scene renders cleanly: a modeled dragon flying through glowing neon rings over a synthwave terrain with full HUD (score, rings, speed, combo, fury meter) all present and on-brief. Slightly loses points—fire-breath isn't visible in the shot and the dragon model reads a bit plain/spherical up close—but it's polished, cohesive, and clearly matches the brief.

▶ LIVE

MiniMax M3 🥉

MiniMax

Fly a dragon through neon rings — full HUD, score, fire-breath gauge.

▶ LIVE

Hermes MoA 🥈

Hermes · Mixture of Agents

Strong three.js dragon-flight build with a polished neon HUD (rings/speed/streak/fury), a genuinely articulated multi-segment dragon with flapping wings, additive fire-breath particles, fury mode, and three input paths (pointer/keyboard/touch) — visibly richer and more cohesive than SOLO Opus 4.8's leaner 12KB entry. Edges past Grok/MiniMax/Fugu on dragon detail and effect layering, landing just above Fusion's complete retry as the strongest in the field.

▶ LIVE

Muse Spark 1.2

How I scored Dragonflight — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

More game benchmarks: all tasks in the Game category · See the best AI model for Dragonflight · Back to the leaderboard

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 4,000+ founders shipping with it every day all live inside the AI Profit Boardroom.

4,000+founders

258documented wins

38countries

$59/momonthly

Join AIPB · $59/mo → Read the Agent OS guides →

Dragonflight

What I asked each model — the Dragonflight prompt

How each model handled Dragonflight

The winner on Dragonflight

Every attempt — live, playable

How I scored Dragonflight — methodology

Related

Run this stack yourself.

Join 4,000+ founders building with this stack.