AI Page benchmarks

1 page tasks where every frontier AI model gets the same one-shot prompt. Live, playable demos. Real 0–10 scores from Julian Goldie.

Top model in this category

GLM-5.2 →

What I'm testing in the Page category

Page tasks ('landing page for a fictional product') are the closest thing on the bench to real product work. The question is: would I ship this if a client asked? On the Nova 1 landing page, Opus and GLM tied at 9/10 — both shippable as-is. On the bench, this is the category where the model's design taste matters more than its code quality.

Every Page task on the bench

1 tasks, 5 total demos across all models. Click any task to see how every AI model handled the same prompt — side by side, live and playable.

Page

Landing

Landing Page — modern marketing landing page (one-shot).

5models

How I score Page tasks

Same three axes as the rest of the bench: runs (does the .html open to a working page), hits the brief (is the thing I asked for what came back), looks good (visual polish, motion, attention to detail). 0–10 each, averaged. Highest score on each task earns gold; second silver; third bronze. Models without a 0–10 verdict are listed as unranked on the leaderboard.

Source guides for the Page category: see the methodology page for full data provenance.

Other categories: Game, Sim, Visual · all tasks · all models

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$100k+/mocommunity MRR

Join AIPB · $59/mo → Read the Agent OS guides →

AI Page benchmarks

What I'm testing in the Page category

Every Page task on the bench

How I score Page tasks

Related

Run this stack yourself.

Join 3,600+ founders building with this stack.