What's the prompt for the Skyrim test?

Skyrim-lite — first-person open-world fantasy explorer. Every model receives this exact prompt, one shot, single HTML file out.

Game

Skyrim

Q: What's the best AI model for Skyrim?

Claude Fable 5 — AAA rebuild is showcase-grade: a dramatic Nordic dusk valley with layered snow-capped peaks, pine forests and ruins, a first-person sword AND studded wooden shield in real PBR wood/metal/leather, drifting clouds and snowfall, and a cohesive HUD (compass strip, objective card, HP/stamina/magicka bars). Walk and look verified. Only the absent on-screen wolves keep it from a flawless read.

Q: How many AI models attempted Skyrim?

25 models on Goldie Bench have attempted Skyrim: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Inkling, Kimi K3, LongCat-2.0, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash, Grok, Kimi K2.7.

Skyrim-lite — first-person open-world fantasy explorer.

CategoryGame

Models tested25

Scored18/25

Avg score7.91/10

WinnerClaude Fable 5

What I asked each model — the Skyrim prompt

Every model on this page got this exact prompt inside the Agent Operating System: Skyrim-lite — first-person open-world fantasy explorer.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 25 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Ultra 1.1, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Inkling, Kimi K3, LongCat-2.0, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash, Grok, Kimi K2.7.

Why this task matters. Skyrim is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Skyrim

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

🥇 9.0/10

What I saw: AAA rebuild is showcase-grade: a dramatic Nordic dusk valley with layered snow-capped peaks, pine forests and ruins, a first-person sword AND studded wooden shield in real PBR wood/metal/leather, drifting clouds and snowfall, and a cohesive HUD (compass strip, objective card, HP/stamina/magicka bars). Walk and look verified. Only the absent on-screen wolves keep it from a flawless read.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

• 7.0/10

What I saw: Ultra v2 (gap-fill) — Skyrim-style open world. Smoke-test MAYBE (0.0% diff) — FPS mouse-look needs pointer-lock which generic input didn't engage; flagged for manual verification.

▶ Play Fugu Ultra's attempt →

Fugu Ultra 1.1 Sakana AI

• 6.8/10

What I saw: Strong Skyrim-flavored HUD (compass, health/stamina/magicka bars, kills tracker, weapon in hand) and a decent low-poly open world with pines, mountains and dirt path; but the scene looks sparse and the enemy/combat presence is weak — one distant blocky figure and no visible engaged fight, so it reads more explorer than the best in field.

▶ Play Fugu Ultra 1.1's attempt →

Fugu Mini Sakana AI

• 8.0/10

What I saw: Mini gap-fill — 13KB Skyrim-style open world. Smoke-test PASS (0.9% diff).

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

🥇 9.0/10 · winner · open world

What I saw: RETRY @ 24K tokens — now complete: 25KB three.js + WebGL with rAF + 8 input handlers + closed tags. Snowy Nordic terrain, low-poly pines, rocks, rolling hills, dragon overhead, health + stamina HUD. WASD + mouse-look.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 7.8/10

What I saw: Strong Skyrim HUD (compass, HP/Magicka/Stamina bars, kill counter, first-person sword viewmodel) and a visible enemy with 'HOSTILES ENGAGED' combat state; weakened by sparse terrain, a floating white slab artifact and only one distant humanoid rather than an active combat encounter.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 8.0/10

What I saw: 22KB · plays clean · three, webgl, pointer-lock

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

• 8.4/10 · Frostfall wilderness explorer

What I saw: Strong on-brief 3D first-person Skyrim-lite: snowy low-poly terrain, distant mountains, pine forest, a stone keep, first-person sword, and a full themed HUD (compass, quest tracker, health/stamina bars). Cohesive and shippable, though the snowy foreground reads a bit washed-out/flat and it stops just short of the field's best.

▶ Play GPT-5.6 Sol's attempt →

Inkling Thinking Machines

• 6.3/10

What I saw: Renders a clean low-poly 3D scene with trees, rocks, and a ruin pillar plus polished title/HUD, but the world feels sparse and floaty with no visible ground plane, no mountains despite the promise, and generic pointer-lock exploration that falls well short of the field's best.

▶ Play Inkling's attempt →

Kimi K3 Moonshot AI

• 8.4/10

What I saw: Renders a genuinely atmospheric open world with layered fog, warm low sun, distant mountains, a village of houses, scattered pines, plus a polished serif HUD with scrolling compass and stamina bar — clearly on-brief and shippable. The oversized green glow sprites (fireflies/particles) look blown-out and slightly buggy in the foreground, holding it just shy of top-tier.

▶ Play Kimi K3's attempt →

The winner on Skyrim

Claude Fable 5 took gold on this task.

See Claude Fable 5's full model card: /models/fable-5.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

Skyrim

What I asked each model — the Skyrim prompt

How each model handled Skyrim

The winner on Skyrim

Every attempt — live, playable

How I scored Skyrim — methodology

Related

Run this stack yourself.