AI Visual benchmarks

8 visual tasks where every frontier AI model gets the same one-shot prompt. Live, playable demos. Real 0–10 scores from Julian Goldie.

Top model in this category

GLM-5.2 →

What I'm testing in the Visual category

Visual tasks (aurora, fireworks, lava lamp, voxel landscape, synthwave) are pure aesthetic tests. There's no game loop to debug — just whether the model ships a polished, on-brand visual on the first try. GLM-5.2 tends to dominate here. Opus is consistent. Kimi plays plainer on these than on game/sim tasks — its bronze average drags from the Visual column.

Every Visual task on the bench

8 tasks, 16 total demos across all models. Click any task to see how every AI model handled the same prompt — side by side, live and playable.

Visual

Aurora

Aurora — northern lights animation.

1models

Visual

Fireworks

Fireworks — interactive fireworks display.

1models

Visual

Lavalamp

Lava Lamp — slow blob morph animation.

2models

Visual

Matrix

Matrix — Matrix-rain falling-glyphs animation.

1models

Visual

Synthwave

Synthwave — sunset-grid synthwave loop.

2models

Visual

Terrain

Terrain — procedural 3D terrain explorer.

3models

Visual

Voxel

Voxel — voxel-art landscape (Minecraft-style).

5models

Visual

Waves

Waves — animated ocean wave simulation.

1models

How I score Visual tasks

Same three axes as the rest of the bench: runs (does the .html open to a working page), hits the brief (is the thing I asked for what came back), looks good (visual polish, motion, attention to detail). 0–10 each, averaged. Highest score on each task earns gold; second silver; third bronze. Models without a 0–10 verdict are listed as unranked on the leaderboard.

Source guides for the Visual category: see the methodology page for full data provenance.

Other categories: Game, Page, Sim · all tasks · all models

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$100k+/mocommunity MRR

Join AIPB · $59/mo → Read the Agent OS guides →

AI Visual benchmarks

What I'm testing in the Visual category

Every Visual task on the bench

How I score Visual tasks

Related

Run this stack yourself.

Join 3,600+ founders building with this stack.