What's the prompt for the Webos test?

Web-OS Desktop — a working desktop with windows, dock, Notes / Paint / Terminal apps. Every model receives this exact prompt, one shot, single HTML file out.

Page

Webos

Q: What's the best AI model for Webos?

Fusion — A tiny working desktop OS in 24KB — wallpaper, taskbar dock with app icons, draggable resizable windows for Notes, Paint, Terminal (echo-only), Calculator. Apple-Sequoia aesthetic. Most ambitious application build on the bench.

Q: How many AI models attempted Webos?

24 models on Goldie Bench have attempted Webos: Claude Fable 5, Fugu Ultra, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash.

Web-OS Desktop — a working desktop with windows, dock, Notes / Paint / Terminal apps.

CategoryPage

Models tested24

Scored19/24

Avg score7.90/10

WinnerFusion

What I asked each model — the Webos prompt

Every model on this page got this exact prompt inside the Agent Operating System: Web-OS Desktop — a working desktop with windows, dock, Notes / Paint / Terminal apps.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 24 frontier models have attempted it so far: Claude Fable 5, Fugu Ultra, Fugu Mini, Fusion, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K3, MiniMax M3, Hermes MoA, Muse Spark 1.2, Opus 4.8, Claude Opus 5, Qwen 3.8, Qwen 3.7, Claude Sonnet 5, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, DeepSeek V4 Pro, DeepSeek V4 Flash.

Why this task matters. Webos is a textbook test of page-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. Shipping this cleanly is the floor for what I expect from a frontier model — every model on the leaderboard should at least attempt it.

How each model handled Webos

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Claude Fable 5 Anthropic

• 8.0/10

What I saw: Renders cleanly with a polished menubar, blurred glass dock with hover labels and running-app dots, and all three apps (Notes, Paint, Terminal) launched and cascading correctly with working traffic-light controls. Strong and shippable, but lacks a maximize/restore and the visuals are competent rather than standout versus the 9.0 top build, so it lands solidly in the strong-but-not-winning tier.

▶ Play Claude Fable 5's attempt →

Fugu Ultra Sakana AI

• 7.0/10

What I saw: Ultra v2 — desktop OS. Smoke-test MAYBE (0.3% diff) — static desktop until you open an app; expected for a UI-shell build.

▶ Play Fugu Ultra's attempt →

Fugu Mini Sakana AI

• 6.5/10

What I saw: Mini gap-fill (round 2) — desktop OS shell. Smoke-test MAYBE (0.0% diff) — static until an app is opened; flagged for manual verification.

▶ Play Fugu Mini's attempt →

Fusion OpenRouter

🥇 9.0/10 · winner · most ambitious app

What I saw: A tiny working desktop OS in 24KB — wallpaper, taskbar dock with app icons, draggable resizable windows for Notes, Paint, Terminal (echo-only), Calculator. Apple-Sequoia aesthetic. Most ambitious application build on the bench.

▶ Play Fusion's attempt →

Gemini 3.6 Flash Google

• 8.0/10

What I saw: Strong, polished glassmorphic desktop with a 3D WebGL wireframe background, working top bar, dock (Terminal/Paint/Notes/Settings), and a functional-looking terminal with prompt; the visible Paint window appears empty (only toolbar/slider showing, no canvas content) which slightly undercuts the multi-app showcase. Clean and shippable but not quite topping the best.

▶ Play Gemini 3.6 Flash's attempt →

GLM-5.2 Zhipu / Z.ai

• 7.5/10

What I saw: 61KB · plays clean · plain

▶ Play GLM-5.2's attempt →

GPT-5.6 Sol OpenAI

• 8.4/10 · Polished glass desktop

What I saw: Strong, cohesive glassmorphic Web-OS with functional Notes and Terminal windows, a polished dock, desktop icons, topbar and helpful hints — clearly shippable and near the top of the field. Slightly short of the best since Paint isn't shown open and the desktop heading text is partly occluded, but overall a very refined, on-brief build.

▶ Play GPT-5.6 Sol's attempt →

Grok xAI

🥇 9.0/10 · winner · ambitious desktop

What I saw: Web-OS desktop with wallpaper, dock, draggable resizable windows for Notes/Paint/Terminal/Calculator. 33KB — beats Fusion's 24KB attempt on density.

▶ Play Grok's attempt →

Inkling Thinking Machines

• 7.8/10

What I saw: Renders cleanly with polished dock, desktop icons, and a functional Terminal window with prompt; drag, close/minimize dots, Paint canvas and localStorage Notes all present per source. Weak point: the title banner is partially hidden behind the window and the empty terminal body looks sparse, making it feel slightly less finished than the top field entries.

▶ Play Inkling's attempt →

Kimi K2.7 Moonshot AI

• 8.0/10

What I saw: 30KB working desktop — wallpaper, dock, draggable windows.

▶ Play Kimi K2.7's attempt →

The winner on Webos

Fusion took gold on this task. winner · most ambitious app.

See Fusion's full model card: /models/fusion.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

▶ LIVE

Claude Fable 5

Anthropic

Renders cleanly with a polished menubar, blurred glass dock with hover labels and running-app dots, and all three apps (Notes, Paint, Terminal) launched and cascading correctly with working traffic-light controls. Strong and shippable, but lacks a maximize/restore and the visuals are competent rather than standout versus the 9.0 top build, so it lands solidly in the strong-but-not-winning tier.

▶ LIVE

Fugu Ultra

Sakana AI

Ultra v2 — desktop OS. Smoke-test MAYBE (0.3% diff) — static desktop until you open an app; expected for a UI-shell build.

▶ LIVE

Fugu Mini

Sakana AI

Mini gap-fill (round 2) — desktop OS shell. Smoke-test MAYBE (0.0% diff) — static until an app is opened; flagged for manual verification.

▶ LIVE

Fusion 🥇

OpenRouter

A tiny working desktop OS in 24KB — wallpaper, taskbar dock with app icons, draggable resizable windows for Notes, Paint, Terminal (echo-only), Calculator. Apple-Sequoia aesthetic. Most ambitious application build on the bench.

▶ LIVE

Gemini 3.6 Flash

Google

Strong, polished glassmorphic desktop with a 3D WebGL wireframe background, working top bar, dock (Terminal/Paint/Notes/Settings), and a functional-looking terminal with prompt; the visible Paint window appears empty (only toolbar/slider showing, no canvas content) which slightly undercuts the multi-app showcase. Clean and shippable but not quite topping the best.

▶ LIVE

GLM-5.2

Zhipu / Z.ai

61KB · plays clean · plain

▶ LIVE

GPT-5.6 Sol

OpenAI

Strong, cohesive glassmorphic Web-OS with functional Notes and Terminal windows, a polished dock, desktop icons, topbar and helpful hints — clearly shippable and near the top of the field. Slightly short of the best since Paint isn't shown open and the desktop heading text is partly occluded, but overall a very refined, on-brief build.

▶ LIVE

Grok 🥇

xAI

Web-OS desktop with wallpaper, dock, draggable resizable windows for Notes/Paint/Terminal/Calculator. 33KB — beats Fusion's 24KB attempt on density.

▶ LIVE

Inkling

Thinking Machines

Renders cleanly with polished dock, desktop icons, and a functional Terminal window with prompt; drag, close/minimize dots, Paint canvas and localStorage Notes all present per source. Weak point: the title banner is partially hidden behind the window and the empty terminal body looks sparse, making it feel slightly less finished than the top field entries.

▶ LIVE

Kimi K2.7

Moonshot AI

30KB working desktop — wallpaper, dock, draggable windows.

▶ LIVE

Kimi K3

Moonshot AI

Strong, highly polished render: crisp macOS-style traffic-light windows, blurred glass dock with running dots, desktop icons, animated 3D wireframe backdrop and starfield, plus a clean welcome/about card — clearly on-brief with Notes/Paint/Terminal. Source confirms real window management (drag/resize/min/max), autosave, and functional apps; only mild risk is unseen app depth, but presentation and completeness edge past the field's best.

▶ LIVE

MiniMax M3

MiniMax

38KB working desktop — wallpaper, dock, draggable Notes/Paint/Terminal/Calculator windows.

▶ LIVE

Hermes MoA

Hermes · Mixture of Agents

Polished webOS shell with animated starfield wallpaper, topbar+clock, dock and desktop launchers, draggable/resizable/min/max windows with traffic-light controls, autosaving Notes, a DPR-aware resizable Paint with rainbow brush, and a Terminal — the careful pointer-capture and ResizeObserver canvas handling edge it slightly past Fusion/Grok (9.0) in craft, though the source is truncated mid-Paint so full Terminal/Calculator verification isn't possible; rated on the COMPLETE flag.

▶ LIVE

Muse Spark 1.2

How I scored Webos — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

More page benchmarks: all tasks in the Page category · See the best AI model for Webos · Back to the leaderboard

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 4,000+ founders shipping with it every day all live inside the AI Profit Boardroom.

4,000+founders

258documented wins

38countries

$59/momonthly

Join AIPB · $59/mo → Read the Agent OS guides →

Webos

What I asked each model — the Webos prompt

How each model handled Webos

The winner on Webos

Every attempt — live, playable

How I scored Webos — methodology

Related

Run this stack yourself.

Join 4,000+ founders building with this stack.