Game

Twilightvale

Twilight Vale — 3D open-world RPG with combat, terrain, weather.

CategoryGame
Models tested5
Scored4/5
Avg score7.75/10
WinnerGrok

What I asked each model — the Twilightvale prompt

Every model on this page got this exact prompt inside the Agent Operating System: Twilight Vale — 3D open-world RPG with combat, terrain, weather.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 5 frontier models have attempted it so far: Fusion, Grok, Kimi K2.7, MiniMax M3, GLM-5.2.

Why this task matters. Twilightvale is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Twilightvale

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Fusion OpenRouter
• 4.0/10

What I saw: TRUNCATED — 8 input handlers wired but NO rAF loop, no </html>, unclosed <script>. The world won't tick, the day/night cycle won't advance, NPCs won't move. Earlier 9.5 'winner · best demo on bench' was completely wrong — I scored on prose density not function. Honest demotion.

▶ Play Fusion's attempt →
Grok xAI
🥇 9.5/10 · winner · open world depth

What I saw: Twilight Vale — 3D open-world RPG with hand-crafted village, NPCs, combat, day/night, weather, inventory. 38KB — densest build of the bench, edges out Fusion's 32KB.

▶ Play Grok's attempt →
Kimi K2.7 Moonshot AI
🥉 8.5/10

What I saw: 64KB open-world RPG with village, NPCs, combat, day/night cycle. Densest Kimi build.

▶ Play Kimi K2.7's attempt →
MiniMax M3 MiniMax
🥈 9.0/10 · winner · biggest open world

What I saw: 47KB — densest open-world. Village, NPCs, combat, day/night, weather, inventory.

▶ Play MiniMax M3's attempt →
GLM-5.2 Zhipu / Z.ai
• unranked

Demo on the bench. Not scored yet — play it and form your own opinion.

▶ Play GLM-5.2's attempt →

The winner on Twilightvale

Grok took gold on this task. winner · open world depth.

What I saw: Twilight Vale — 3D open-world RPG with hand-crafted village, NPCs, combat, day/night, weather, inventory. 38KB — densest build of the bench, edges out Fusion's 32KB.

See Grok's full model card: /models/grok. Direct head-to-head against the runner-up: Grok vs MiniMax M3.

How I scored Twilightvale — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR