AI Other benchmarks

3 other tasks where every frontier AI model gets the same one-shot prompt. Live, playable demos. Real 0–10 scores from Julian Goldie.

What I'm testing in the Other category

The Other category covers 3 one-shot creative coding tasks where the model has to ship a complete, working other build from a single fixed prompt.

Every Other task on the bench

3 tasks, 3 total demos across all models. Click any task to see how every AI model handled the same prompt — side by side, live and playable.

Other

Matrixrain

Matrixrain — auto-discovered task.

1models

Other

Mlx Speedtest

Mlx Speedtest — auto-discovered task.

1models

Other

Neonsnake

Neonsnake — auto-discovered task.

1models

How I score Other tasks

Same three axes as the rest of the bench: runs (does the .html open to a working page), hits the brief (is the thing I asked for what came back), looks good (visual polish, motion, attention to detail). 0–10 each, averaged. Highest score on each task earns gold; second silver; third bronze. Models without a 0–10 verdict are listed as unranked on the leaderboard.

Source guides for the Other category: see the methodology page for full data provenance.

Other categories: Game, Page, Sim, Visual · all tasks · all models

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$59/momonthly

Join AIPB · $59/mo → Read the Agent OS guides →

AI Other benchmarks

What I'm testing in the Other category

Every Other task on the bench

How I score Other tasks

Related

Run this stack yourself.

Join 3,600+ founders building with this stack.