Vendor

All xAI AI models

Elon Musk's xAI — Grok lives inside X (Twitter) Premium. Real-time and snappy.

Models on bench1
Total task attempts13
Scored cells0
Gold medals🥇 0

My take on xAI

xAI's Grok lives inside X (Twitter) Premium. The unique pitch is real-time access to X timeline data, which no other model on the bench has. On the bench scoring, Grok has 13 demos but no curated 0–10 verdicts yet — currently unranked. The 256K context window keeps it competitive on the spec sheet.

Where I use xAI inside the Agent OS. Each model below has a "How I use it" line in its detail page — that's the daily-usage view, not the marketing pitch.

Every xAI model on Goldie Bench

Click any card for the full model card, every demo, and direct head-to-head comparisons.

How I tested xAI's models

Every model on this page received the exact same fixed prompt as every other model on the bench. One shot, single HTML file out, scored 0–10 by me on three axes (runs, hits the brief, looks good). The scoring is published in my source comparison guides on agentos.guide — see the methodology page for full data provenance.

Vendor: x.ai ↗

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR