What's the prompt for the Orbit test?

Orbit — N-body gravitational simulation. Every model receives this exact prompt, one shot, single HTML file out.

Sim

Orbit

Q: What's the best AI model for Orbit?

Opus 4.8 — Opus nailed the brief — labelled planet orbits, a real NEO / close-pass panel, a sim clock. GLM went for drama: a glowing nebula swirl that's gorgeous but reads more galaxy than orbit map. Kimi's is accurate but dim and sparse.

Q: How many AI models attempted Orbit?

23 models on Goldie Bench have attempted Orbit: Claude Fable 5, Fugu Ultra, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash.

Orbit — N-body gravitational simulation.

CategorySim

Models tested23

Scored18/23

Avg score7.53/10

WinnerOpus 4.8

What I asked each model — the Orbit prompt

Every model on this page got this exact prompt inside the Agent Operating System: Orbit — N-body gravitational simulation.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 23 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash.

Why this task matters. Orbit is a textbook test of sim-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. Shipping this cleanly is the floor for what I expect from a frontier model — every model on the leaderboard should at least attempt it.

How each model handled Orbit

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

• 7.6/10

What I saw: Renders cleanly with a glowing sun, colored planets, starfield and solid 3D orbit/zoom/spawn controls plus real N-body physics with softening and substeps; however the screenshot shows no visible orbital trails and a somewhat sparse, static-looking layout, keeping it strong-but-generic rather than a task winner.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

• 8.5/10

What I saw: 26KB inner-solar-system orbit map with a glassmorphic info panel, kicker badge, blurred backdrop, hover cards. Cleaner UI than Fusion's same-task attempt — beats it on polish.

▶ Play Fugu Ultra's attempt →

Fugu Mini Sakana AI

• 8.0/10

What I saw: Inner-system orbit map with hover info. Smoke-test PASS.

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

• 8.0/10

What I saw: Top-down inner-system map: Mercury / Venus / Earth / Mars orbiting the sun with accurate relative speeds. Per-planet colour palette, info card on hover, controls bar at bottom with speed slider and play/pause. Solid hit on the brief, ties with GLM on the same task.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 8.1/10

What I saw: Strong 3D presentation with glowing sun, orbiting planets, gravity-well grid, and a full control suite (presets, G/time/trail sliders, launch mode); however FPS reads only 9 which signals a performance issue that dampens the sim's smoothness.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 7.5/10

What I saw: Opus nailed the brief — labelled planet orbits, a real NEO / close-pass panel, a sim clock. GLM went for drama: a glowing nebula swirl that's gorgeous but reads more galaxy than orbit map. Kimi's is accurate but dim and sparse.

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

🥉 8.6/10 · polished 3D orbits

What I saw: Strong 3D render with glowing sun, ringed body, orbital trails, polar grid, and a clean glassy UI with live stats, energy drift, inspector, and comet/reset controls—clearly on-brief and shippable. Minor caveat: it's a curated N-body-flavored system rather than a chaotic true N-body demo, but the polish and completeness edge it to task-winner level.

▶ Play GPT-5.6 Sol's attempt →

Grok xAI

• 8.5/10

What I saw: A proper inner solar system now — a glowing Sun, four planets riding clean elliptical rings, a starfield, a data HUD and play/speed controls. The vague first sentence drew blurry circles; this one reads instantly.

▶ Play Grok's attempt →

Inkling Thinking Machines

• 7.2/10

What I saw: Clean render with polished title, colorful glowing bodies, starfield and working 3D orbit/spawn interactions; but the physics uses non-symplectic Euler with a hard bounds hack, no orbital trails, and the scattered layout doesn't visually read as gravitational clustering—competent but generic versus the field's best.

▶ Play Inkling's attempt →

Kimi K2.7 Moonshot AI

• 6.0/10

▶ Play Kimi K2.7's attempt →

The winner on Orbit

Opus 4.8 took gold on this task. winner · accuracy.

See Opus 4.8's full model card: /models/opus.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

Orbit

What I asked each model — the Orbit prompt

How each model handled Orbit

The winner on Orbit

Every attempt — live, playable

How I scored Orbit — methodology

Related

Run this stack yourself.