Sim

Galaxy

Galaxy — particle galaxy you can swirl with your mouse.

CategorySim
Models tested7
Scored3/7
Avg score8.17/10
WinnerOpus 4.8

What I asked each model — the Galaxy prompt

Every model on this page got this exact prompt inside the Agent Operating System: Galaxy — particle galaxy you can swirl with your mouse.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 7 frontier models have attempted it so far: GLM-5.2, Kimi K2.7, Opus 4.8, Grok, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality.

Why this task matters. Galaxy is a textbook test of sim-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. Shipping this cleanly is the floor for what I expect from a frontier model — every model on the leaderboard should at least attempt it.

How each model handled Galaxy

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

GLM-5.2 Zhipu / Z.ai
🥈 8.0/10

What I saw: Opus built a proper interactive 3D galaxy — drag to orbit a 7,000-star cloud around a glowing core. Kimi's is the prettiest single frame: a clean tilted spiral disk with rainbow arms. GLM's runs on a canvas with a slick NGC-style HUD and zoom, just less dramatic at a glance. Three good galaxies, three different bets.

▶ Play GLM-5.2's attempt →
Kimi K2.7 Moonshot AI
🥈 8.0/10

What I saw: Opus built a proper interactive 3D galaxy — drag to orbit a 7,000-star cloud around a glowing core. Kimi's is the prettiest single frame: a clean tilted spiral disk with rainbow arms. GLM's runs on a canvas with a slick NGC-style HUD and zoom, just less dramatic at a glance. Three good galaxies, three different bets.

▶ Play Kimi K2.7's attempt →
Opus 4.8 Anthropic
🥇 8.5/10 · winner · interactive 3D

What I saw: Opus built a proper interactive 3D galaxy — drag to orbit a 7,000-star cloud around a glowing core. Kimi's is the prettiest single frame: a clean tilted spiral disk with rainbow arms. GLM's runs on a canvas with a slick NGC-style HUD and zoom, just less dramatic at a glance. Three good galaxies, three different bets.

▶ Play Opus 4.8's attempt →
Grok xAI
• unranked

Demo on the bench. Not scored yet — play it and form your own opinion.

▶ Play Grok's attempt →
Kimi K2.7 · Fast Moonshot AI
• unranked

Demo on the bench. Not scored yet — play it and form your own opinion.

▶ Play Kimi K2.7 · Fast's attempt →
Kimi K2.7 · No-Think Moonshot AI
• unranked

Demo on the bench. Not scored yet — play it and form your own opinion.

▶ Play Kimi K2.7 · No-Think's attempt →
Kimi K2.7 · Quality Moonshot AI
• unranked

Demo on the bench. Not scored yet — play it and form your own opinion.

▶ Play Kimi K2.7 · Quality's attempt →

The winner on Galaxy

Opus 4.8 took gold on this task. winner · interactive 3D.

What I saw: Opus built a proper interactive 3D galaxy — drag to orbit a 7,000-star cloud around a glowing core. Kimi's is the prettiest single frame: a clean tilted spiral disk with rainbow arms. GLM's runs on a canvas with a slick NGC-style HUD and zoom, just less dramatic at a glance. Three good galaxies, three different bets.

See Opus 4.8's full model card: /models/opus. Direct head-to-head against the runner-up: Opus 4.8 vs GLM-5.2.

How I scored Galaxy — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR