Raycaster
Raycaster Maze — build a Wolfenstein-style 3D maze you can walk through.
What I asked each model — the Raycaster prompt
Every model on this page got this exact prompt inside the Agent Operating System: Raycaster Maze — build a Wolfenstein-style 3D maze you can walk through.
Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 3 frontier models have attempted it so far: GLM-5.2, Kimi K2.7, Opus 4.8.
Why this task matters. Raycaster is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.
How each model handled Raycaster
Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.
What I saw: Kimi nailed it — brick walls, a checkered floor, a clean minimap, textbook Wolfenstein, runs clean out of the box. Opus's is close and more atmospheric: warm fog and a vignette down a stone corridor (A/D to turn, W/S to move). GLM's engine is genuinely good — brick and mossy-stone walls, fog, a minimap — but its one-shot spawned the player buried inside a wall, dead on arrival; I nudged the start one cell so you can actually walk it. That spawn bug is why it scores lowest here, even though the e
What I saw: Kimi nailed it — brick walls, a checkered floor, a clean minimap, textbook Wolfenstein, runs clean out of the box. Opus's is close and more atmospheric: warm fog and a vignette down a stone corridor (A/D to turn, W/S to move). GLM's engine is genuinely good — brick and mossy-stone walls, fog, a minimap — but its one-shot spawned the player buried inside a wall, dead on arrival; I nudged the start one cell so you can actually walk it. That spawn bug is why it scores lowest here, even though the e
What I saw: Kimi nailed it — brick walls, a checkered floor, a clean minimap, textbook Wolfenstein, runs clean out of the box. Opus's is close and more atmospheric: warm fog and a vignette down a stone corridor (A/D to turn, W/S to move). GLM's engine is genuinely good — brick and mossy-stone walls, fog, a minimap — but its one-shot spawned the player buried inside a wall, dead on arrival; I nudged the start one cell so you can actually walk it. That spawn bug is why it scores lowest here, even though the e
The winner on Raycaster
Kimi K2.7 took gold on this task. winner · cleanest.
What I saw: Kimi nailed it — brick walls, a checkered floor, a clean minimap, textbook Wolfenstein, runs clean out of the box. Opus's is close and more atmospheric: warm fog and a vignette down a stone corridor (A/D to turn, W/S to move). GLM's engine is genuinely good — brick and mossy-stone walls, fog, a minimap — but its one-shot spawned the player buried inside a wall, dead on arrival; I nudged the start one cell so you can actually walk it. That spawn bug is why it scores lowest here, even though the e
See Kimi K2.7's full model card: /models/kimi. Direct head-to-head against the runner-up: Kimi K2.7 vs Opus 4.8.
Every attempt — live, playable
Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.
▶ LIVE
▶ LIVE
▶ LIVEHow I scored Raycaster — methodology
Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.
My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.
Related
More game benchmarks: all tasks in the Game category · See the best AI model for Raycaster · Back to the leaderboard
Run this stack yourself.
Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.