How many AI models can do Doom?

3 models on the Goldie Bench bench have attempted Doom: Kimi K2.7, Opus 4.8, GLM-5.2.

Best AI model for…

Best AI model for Doom

Q: What's the best AI model for Doom?

Kimi K2.7 — All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

Doom — put monsters in the raycaster maze and let them chase you.

🥇

Best model for Doom

Kimi K2.7

"All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight."

See Kimi K2.7's full track record →

The prompt — what I asked each model

Every model on this page got the same fixed prompt inside the Agent Operating System: Doom — put monsters in the raycaster maze and let them chase you.

Single HTML file out. No iteration. No "best of N." No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 3 models have attempted it so far — Kimi K2.7, Opus 4.8, GLM-5.2.

What counts as winning here

This is the game category on Goldie Bench. The question isn't "did the model write code that compiles" — the question is "did the model ship a thing you'd actually use." For Doom that means three things, in order:

Does it run? Drop the .html file in a browser. If it opens to a broken page, it scores zero on the first axis.
Did it hit the brief? The prompt asks for a specific thing. A model that ships a different thing — however polished — gets docked on the brief axis.
Does it look good? Visual polish, motion, interactivity, attention to detail. This is where the difference between gold and silver usually lives.

Final score is my honest 0–10 across all three axes, averaged. Across the 3 models I've scored on this task so far, the average score is 8.33/10.

Every model's attempt — ranked by my 0–10 score

Models ranked by medal (highest score = 🥇 gold, second = 🥈 silver, third = 🥉 bronze). Click any tile to play that model's actual one-shot HTML.

▶ LIVE

Kimi K2.7 🥇

Moonshot AI

All three are real, playable shooters. Opus drops you in a corridor with an imp dead ahead — gun, crosshair and HUD framed like a screenshot. Kimi matches it: a monster down a textured hall, health, ammo, minimap. GLM ships a gorgeous 'HAZARD PROTOCOL' title screen with a working game behind it, though it too spawns facing a wall. Opus by a hair on the cleanest fight.

How I tested this — the methodology in 60 seconds

Every comparison on Goldie Bench follows the same recipe:

I pick a creative coding prompt that a frontier model should be able to ship in one shot.
I dispatch the exact same prompt to each model from the kanban inside the Agent Operating System.
I save whatever .html file the model produced on the first run. No iteration. No coaching.
I score each result 0–10 on my three axes (runs / hits the brief / looks good).
I publish the scores publicly in the source comparison guides on agentos.guide — and that's what feeds this page.

See the methodology page for full data provenance, including which source guides each cell's score came from.

Related tasks and comparisons

More tasks in the Game category · All attempts on Doom · Back to the leaderboard

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$100k+/mocommunity MRR

Join AIPB · $59/mo → Read the Agent OS guides →