Every benchmark task

All AI benchmark tasks

32 one-shot creative coding tasks across 4 categories — every task is a fixed prompt I send to every frontier model on the bench. Same input, single HTML file out, scored 0–10 on three axes. Click any task to see how each model handled the same prompt, side by side.

Game

Game
Arcade
Arcade — classic arcade-style game (pick: tetris, breakout, snake).
5models
Game
Crypt
Crypt — torch-lit dungeon crawler.
1models
Game
Dogfight
Dogfight — air-combat shooter.
1models
Game
Doom
Doom — put monsters in the raycaster maze and let them chase you.
3models
Game
Game
Game — generic 'make a game' open prompt.
3models
Game
Neoncity
Neon City — cyberpunk neon-lit city you drive through.
3models
Game
Outrun
Outrun — synthwave horizon driving game with pseudo-3D road.
3models
Game
Pool
Pool — physically simulated billiards game.
1models
Game
Racing
3D Racer — third-person racing game with a track and obstacles.
1models
Game
Raycaster
Raycaster Maze — build a Wolfenstein-style 3D maze you can walk through.
3models
Game
Rpg
RPG — top-down RPG with sprites, combat, inventory.
1models
Game
Skyrim
Skyrim-lite — first-person open-world fantasy explorer.
1models

Page

Page
Landing
Landing Page — modern marketing landing page (one-shot).
5models

Sim

Sim
Blackhole
Black Hole — gravitational lensing visualisation.
4models
Sim
Boids
Boids — flocking-birds emergent behaviour simulation.
2models
Sim
Cloth
Cloth — physical cloth simulation draping over an object.
2models
Sim
Fluid
Fluid — WebGL fluid simulation with swirling particles.
5models
Sim
Fractal
Fractal — interactive fractal explorer (mandelbrot or julia).
4models
Sim
Galaxy
Galaxy — particle galaxy you can swirl with your mouse.
7models
Sim
Orbit
Orbit — N-body gravitational simulation.
5models
Sim
Pathtracer
Path Tracer — physically-correct ray-traced renderer.
3models
Sim
Reactiondiff
Reaction-Diffusion — Turing pattern generator.
2models
Sim
Solar
Solar — accurate planetary solar system.
6models
Sim
Wormhole
Wormhole — 3D wormhole tunnel flythrough.
1models

Visual

Visual
Aurora
Aurora — northern lights animation.
1models
Visual
Fireworks
Fireworks — interactive fireworks display.
1models
Visual
Lavalamp
Lava Lamp — slow blob morph animation.
2models
Visual
Matrix
Matrix — Matrix-rain falling-glyphs animation.
1models
Visual
Synthwave
Synthwave — sunset-grid synthwave loop.
2models
Visual
Terrain
Terrain — procedural 3D terrain explorer.
3models
Visual
Voxel
Voxel — voxel-art landscape (Minecraft-style).
5models
Visual
Waves
Waves — animated ocean wave simulation.
1models

Why these tasks?

Every task on this page is a creative coding prompt that frontier AI models should be able to ship in one shot. The categories were picked to stress different capabilities: Game tasks test geometry + game-loop + input + state; Sim tasks test math + visual taste; Visual tasks test pure aesthetic judgment; Page tasks test product instinct.

If a model can ship the Game tasks cleanly, it's safe to wire into an agent loop. If it can ship Visual cleanly, it has design taste — useful for content workflows. The benches you read about online (HumanEval, MMLU, SWE-bench) measure a different thing entirely; this bench measures what shows up when you ask a frontier model for a working artifact.

How I add a new task

A new task lands when one of three things happens: (1) a new frontier model ships and an existing task needs an updated baseline; (2) someone inside the AI Profit Boardroom proposes one and we agree it's a useful test; (3) I'm building something with the Agent OS and the prompt itself is novel enough to deserve a permanent cell.

The bar is: would a working operator try the prompt themselves? If yes — it's a task. If it's a synthetic stress test of one capability, it's probably better measured by HumanEval-style benchmarks; that's not what Goldie Bench is for.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR