Game

Dragonrealm

Q: What's the best AI model for Dragonrealm?

Fusion — The Dragon Realm — Skyrim-style frozen open world with full HUD (score/vitality/stamina), snowy mountains, low-poly pine forest, a flying dragon. WASD + mouse-look. Tied with GLM's deep build at the top of the task.

Q: How many AI models attempted Dragonrealm?

27 models on Goldie Bench have attempted Dragonrealm: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Hy3, Inkling, Kimi K3, LongCat-2.0, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Grok, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

The Dragon Realm — Skyrim-style frozen open world, walk-into-the-snow, draw your sword. Julian's flagship deep-build prompt.

CategoryGame

Models tested27

Scored20/27

Avg score7.88/10

WinnerFusion

What I asked each model — the Dragonrealm prompt

Every model on this page got this exact prompt inside the Agent Operating System: The Dragon Realm — Skyrim-style frozen open world, walk-into-the-snow, draw your sword. Julian's flagship deep-build prompt.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 27 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Hy3, Inkling, Kimi K3, LongCat-2.0, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Grok, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Why this task matters. Dragonrealm is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Dragonrealm

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

• 7.8/10

What I saw: Clean, atmospheric frozen scene with layered mountains, snow-capped pines, standing stones and a brazier fire that reads clearly as a Skyrim-inspired vale; solid low-poly polish and full control set (walk/sprint/sword/dragon). Falls short of the top tier because the visible frame is a bit sparse and flat—no dragon or sword in view, terrain lacks detail up close—so it's shippable and pleasant but not a task-topping standout.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

• 7.0/10

What I saw: Ultra v2 — Skyrim-style frozen open world. Smoke-test MAYBE (0.1% diff) — loads, minimal change on generic input.

▶ Play Fugu Ultra's attempt →

Fugu Ultra 1.1 Sakana AI

🥉 8.6/10 · Frozen combat realm

What I saw: Strong Skyrim-vibe frozen open world with layered snowy mountains, a player with visible sword, multiple approaching enemies with health orbs, runes, an event banner ('DRAGON SWOOP · FIRE BREATH') and polished HUD/compass — clearly beats the empty-walking-sim trap. Enemy models are a touch simplistic and the moon/sky are basic, but the visible combat and worldbuilding make it a task winner.

▶ Play Fugu Ultra 1.1's attempt →

Fugu Mini Sakana AI

• 8.0/10

What I saw: Skyrim-style frozen open world with HUD. Smoke-test PASS.

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

🥇 9.0/10 · winner · Dragon Realm

What I saw: The Dragon Realm — Skyrim-style frozen open world with full HUD (score/vitality/stamina), snowy mountains, low-poly pine forest, a flying dragon. WASD + mouse-look. Tied with GLM's deep build at the top of the task.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 8.3/10

What I saw: Strong atmospheric frozen world with snowy terrain, pine forest, night sky, a viewable held sword, a shrine/altar with particle effects, and a visible humanoid enemy plus full HUD (vitality/stamina/compass/kills). Polished and clearly shippable, but the sword FP model looks a bit awkwardly placed and combat isn't confirmed on-screen, keeping it just under the top tier.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 7.5/10

What I saw: 142KB · plays clean · plain

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

🥉 8.6/10 · Frostbound atmosphere wins

What I saw: Strong on-brief render: cohesive misty low-poly frozen world with layered snow mountains, pines, a ruined watchtower objective, a flying dragon silhouette, drawn sword in view, and elegant Skyrim-style HUD (compass, quest marker, hint bar, health). Very polished atmosphere; only mild weaknesses are the flat mid-ground and somewhat empty snow plain, but it matches/edges the best of the field.

▶ Play GPT-5.6 Sol's attempt →

Hy3 Tencent Hunyuan

• 7.2/10

What I saw: Strong atmospheric snowy world with layered pines, soft shadows, snowfall, and clean HUD (health/stamina/compass/sword chip), but the hero reads as a stubby hooded blob with hidden face and no visible arms/legs, undercutting the flagship Skyrim-ranger fantasy.

▶ Play Hy3's attempt →

Inkling Thinking Machines

• 6.3/10

What I saw: Renders a clean, atmospheric snowy scene with polished Cinzel title, mountains, and vignette, but the snow particles are oversized blocky squares (size bug from the broken forEach loop) that read as debris, and crucially the sword isn't visible in-frame — undercutting the core 'draw your sword' brief. Solid mood, but generic and missing key interactivity payoff.

▶ Play Inkling's attempt →

The winner on Dragonrealm

Fusion took gold on this task. winner · Dragon Realm.

See Fusion's full model card: /models/fusion.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

Dragonrealm

What I asked each model — the Dragonrealm prompt

How each model handled Dragonrealm

The winner on Dragonrealm

Every attempt — live, playable

How I scored Dragonrealm — methodology

Related

Run this stack yourself.