Game

Doom

Doom — put monsters in the raycaster maze and let them chase you.

CategoryGame
Models tested3
Scored3/3
Avg score8.33/10
WinnerKimi K2.7

What I asked each model — the Doom prompt

Every model on this page got this exact prompt inside the Agent Operating System: Doom — put monsters in the raycaster maze and let them chase you.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 3 frontier models have attempted it so far: GLM-5.2, Kimi K2.7, Opus 4.8.

Why this task matters. Doom is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Doom

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

GLM-5.2 Zhipu / Z.ai
🥉 8.0/10

What I saw: All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

▶ Play GLM-5.2's attempt →
Kimi K2.7 Moonshot AI
🥇 8.5/10

What I saw: All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

▶ Play Kimi K2.7's attempt →
Opus 4.8 Anthropic
🥇 8.5/10 · winner · game-feel

What I saw: All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

▶ Play Opus 4.8's attempt →

The winner on Doom

Kimi K2.7 took gold on this task.

What I saw: All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

See Kimi K2.7's full model card: /models/kimi.

How I scored Doom — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR