Best AI model for…

Best AI model for Wormhole

Wormhole — 3D wormhole tunnel flythrough.

The prompt — what I asked each model

Every model on this page got the same fixed prompt inside the Agent Operating System: Wormhole — 3D wormhole tunnel flythrough.

Single HTML file out. No iteration. No "best of N." No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 1 models have attempted it so far — Kimi K2.7.

What counts as winning here

This is the sim category on Goldie Bench. The question isn't "did the model write code that compiles" — the question is "did the model ship a thing you'd actually use." For Wormhole that means three things, in order:

  1. Does it run? Drop the .html file in a browser. If it opens to a broken page, it scores zero on the first axis.
  2. Did it hit the brief? The prompt asks for a specific thing. A model that ships a different thing — however polished — gets docked on the brief axis.
  3. Does it look good? Visual polish, motion, interactivity, attention to detail. This is where the difference between gold and silver usually lives.

Final score is my honest 0–10 across all three axes, averaged. I haven't published the per-model scores for this task yet — the demos are still on the bench.

Every model's attempt — ranked by my 0–10 score

Models ranked by medal (highest score = 🥇 gold, second = 🥈 silver, third = 🥉 bronze). Click any tile to play that model's actual one-shot HTML.

How I tested this — the methodology in 60 seconds

Every comparison on Goldie Bench follows the same recipe:

  1. I pick a creative coding prompt that a frontier model should be able to ship in one shot.
  2. I dispatch the exact same prompt to each model from the kanban inside the Agent Operating System.
  3. I save whatever .html file the model produced on the first run. No iteration. No coaching.
  4. I score each result 0–10 on my three axes (runs / hits the brief / looks good).
  5. I publish the scores publicly in the source comparison guides on agentos.guide — and that's what feeds this page.

See the methodology page for full data provenance, including which source guides each cell's score came from.

Related tasks and comparisons

More tasks in the Sim category · All attempts on Wormhole · Back to the leaderboard

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR