What's the best AI model for Gtadrive?

GLM-5.2 — 47KB · plays clean · three, webgl

How many AI models attempted Gtadrive?

7 models on Goldie Bench have attempted Gtadrive: Fusion, GLM-5.2, Grok, Kimi K2.7, MiniMax M3, Opus 4.8, Qwen 3.7.

What's the prompt for the Gtadrive test?

GTA Drive — open-city driving sandbox: steal cars, outrun cops, traffic, wanted level, minimap. Every model receives this exact prompt, one shot, single HTML file out.

Game

Gtadrive

GTA Drive — open-city driving sandbox: steal cars, outrun cops, traffic, wanted level, minimap.

CategoryGame

Models tested7

Scored7/7

Avg score7.93/10

WinnerGLM-5.2

What I asked each model — the Gtadrive prompt

Every model on this page got this exact prompt inside the Agent Operating System: GTA Drive — open-city driving sandbox: steal cars, outrun cops, traffic, wanted level, minimap.

Single HTML file out. No iteration. No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 7 frontier models have attempted it so far: Fusion, GLM-5.2, Grok, Kimi K2.7, MiniMax M3, Opus 4.8, Qwen 3.7.

Why this task matters. Gtadrive is a textbook test of game-class capability — the kind of build that exposes whether a model is doing pattern-matching or actual reasoning. A model that ships this in one shot is usually safe to wire into your agent loop for harder tasks of the same shape.

How each model handled Gtadrive

Ranked by my 0–10 score from the source comparison guides on agentos.guide. Click any to play the actual one-shot HTML the model produced.

Fusion OpenRouter

• 7.5/10

What I saw: 114KB · plays clean · plain

▶ Play Fusion's attempt →

GLM-5.2 Zhipu / Z.ai

🥇 8.0/10

What I saw: 47KB · plays clean · three, webgl

▶ Play GLM-5.2's attempt →

Grok xAI

🥇 8.0/10

What I saw: 11KB · plays clean · three, webgl, input

▶ Play Grok's attempt →

Kimi K2.7 Moonshot AI

🥇 8.0/10

What I saw: 23KB · plays clean · three, webgl

▶ Play Kimi K2.7's attempt →

MiniMax M3 MiniMax

🥇 8.0/10

What I saw: 33KB · plays clean · three, webgl

▶ Play MiniMax M3's attempt →

Opus 4.8 Anthropic

🥇 8.0/10

What I saw: 22KB · plays clean · three, webgl (re-rolled)

▶ Play Opus 4.8's attempt →

Qwen 3.7 Alibaba

🥇 8.0/10

What I saw: 21KB · plays clean · three, webgl

▶ Play Qwen 3.7's attempt →

The winner on Gtadrive

GLM-5.2 took gold on this task.

What I saw: 47KB · plays clean · three, webgl

See GLM-5.2's full model card: /models/glm.

Every attempt — live, playable

Side by side. Click any tile to run that model's actual one-shot HTML in a new tab.

▶ LIVE

Fusion

OpenRouter

114KB · plays clean · plain

▶ LIVE

GLM-5.2 🥇

Zhipu / Z.ai

47KB · plays clean · three, webgl

▶ LIVE

Grok 🥇

xAI

11KB · plays clean · three, webgl, input

▶ LIVE

Kimi K2.7 🥇

Moonshot AI

23KB · plays clean · three, webgl

▶ LIVE

MiniMax M3 🥇

MiniMax

33KB · plays clean · three, webgl

▶ LIVE

Opus 4.8 🥇

Anthropic

22KB · plays clean · three, webgl (re-rolled)

▶ LIVE

Qwen 3.7 🥇

Alibaba

21KB · plays clean · three, webgl

How I scored Gtadrive — methodology

Three axes, 0–10 each, averaged. Runs: drop the .html in a browser; if it opens to a broken page, it scores zero. Hits the brief: did the model ship the thing the prompt asked for, or a different thing it found easier. Looks good: visual polish, motion, interactivity — where most of the gap between gold and silver lives.

My scores trace back to the source comparison guides on agentos.guide. See the full methodology page for data provenance, including which source guide each cell's score came from.

More game benchmarks: all tasks in the Game category · See the best AI model for Gtadrive · Back to the leaderboard

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$59/momonthly

Join AIPB · $59/mo → Read the Agent OS guides →

Gtadrive

What I asked each model — the Gtadrive prompt

How each model handled Gtadrive

The winner on Gtadrive

Every attempt — live, playable

How I scored Gtadrive — methodology

Related

Run this stack yourself.

Join 3,600+ founders building with this stack.