Anthropic

Claude Sonnet 5

The agentic SWE frontier — 82% SWE-bench Verified, Dev Team mode.

Context1,000,000 tokens
Pricing$3 / $15 per M ($2/$10 intro)
Tasks tested42
Avg score7.18/10 average
Medals🥇3 🥈3 🥉3
Release2026-06-30
Official siteanthropic.com ↗
Official vendor source
Claude Sonnet 5 is built by Anthropic — see the vendor's own product page, pricing, and docs at anthropic.com.
Visit anthropic.com →

Reference benchmarks for Claude Sonnet 5

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for Claude Sonnet 5 is honest about what's measured.

SWE-bench Verified
82.1%
First model past 80% on SWE-bench Verified
one-shot GitHub-issue repair

What is Claude Sonnet 5?

Claude Sonnet 5 is the Anthropic frontier model with a 1,000,000 tokens context window, released 2026-06-30. Tagline: The agentic SWE frontier — 82% SWE-bench Verified, Dev Team mode.. Official source: anthropic.com.

Pricing detail. $3.00 input / $15.00 output per million tokens; introductory $2.00/$10.00 through 2026-08-31.

How I use it inside the Agent OS. Reach for it in Agent OS when the job is iterative, tool-using software engineering. For one-shot visual builds, GLM 5.2 (free) beat it 4-1 here.

What I built with Claude Sonnet 5

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what Claude Sonnet 5 shipped on the bench: 42 one-shot demos across 1,000,000 tokens of context. Of those, 42 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

  • 82.1% SWE-bench Verified — first model past 80% on real GitHub-issue repair
  • Dev Team multi-agent mode + 1M context for repo-level agentic work
  • Precision on hard logic — won the raycaster the open-weight field kept botching

Trade-offs

  • One-shot creative-visual builds trail GLM 5.2 here (lost 4 of 5) — no iteration to catch its own bugs
  • A temporal-dead-zone bug blanked its N-body orbit sim on the first shot

Best for

  • Agentic software engineering — write / run / test / fix loops on real repos
  • Repo-level reasoning across a 1M-token context (Dev Team multi-agent mode)
  • Precise logic — raycasters, physics — where one-shot open models slip

Every demo by Claude Sonnet 5

42 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

Arcade▶ LIVE
Arcade
Game
Clean, polished neon Breakout that renders perfectly with glowing bricks, gradient paddle, particle bursts, and a live in-progress state (score 10, a brick already cleared). Solid physics with side-aware collisions and level scaling, but it's a well-executed take on the most common pick rather than something that clearly tops the best-in-field.
Crypt▶ LIVE
Crypt
Game
Renders with atmospheric HUD, functional minimap showing the maze, and a working torch-lit dungeon architecture in code, but the screenshot is too dark/muddy with no visible walls, torches, or dungeon geometry — the dim ambient/fog balance undersells the crawler and looks flat rather than dramatically lit.
Dogfight▶ LIVE
Dogfight
Game
Renders cleanly with a polished HUD, crosshair, and 3D plane in a proper 3D scene, but the screenshot shows the player facing straight down at empty ground with no enemies visible, and the plane model looks flat/basic — functional but generic compared to the field's best.
Doom▶ LIVE
Doom
Game
Renders a clean raycaster maze with atmospheric red-lit walls, working minimap, HUD health/kills bar, crosshair and a monster visible at screen edge; solid feature set (hitscan shooting, chasing AI, damage flash, touch controls) makes it strong and shippable, though the wall shading looks somewhat flat and the visible monster sprites appear simple, keeping it just below the top.
Dragonflight▶ LIVE
Dragonflight
Game
Renders a clean 3D neon-ring flight scene with a full HUD (score, distance, speed/Mach, fury bar, radar, touch controls) and a recognizable winged dragon body with glowing spikes. Solid and shippable, but the dragon model reads a bit blobby/awkward from behind and the scene feels sparse compared to the field's best, keeping it short of a task winner.
Dragonrealm▶ LIVE
Dragonrealm
Game
Strong, atmospheric night-time frozen realm — snowy terrain, pine forest, low-poly mountains and a moonlit sky read convincingly as Skyrim-esque, with a fully-modeled sword-wielding character and clean sword-draw/sheath system. Weak points: character proportions are a bit stiff and no dragons or combat depth are visible on-screen, keeping it just below the top tier.
Game▶ LIVE
Game
Game
Strong, polished 3D Three.js build with clean neon aesthetic, glowing player orb, colorful octahedron collectibles, spinning obstacles, shadows, starfield, and full HUD/lives/timer loop — clearly shippable. Falls just short of the field's best due to being a fairly familiar collect-and-dodge concept rather than a standout mechanic.
Neonblaster▶ LIVE
Neonblaster
Game
Strong neon aesthetic with glowing nebulae, layered starfield, polished ship and auto-fire bullets, plus a full feature set (waves, bosses, power-ups, synth sequencer, screen-shake); the screenshot looks clean and on-brief but reads a touch sparse/quiet early on versus the flashiest field entries.
Neoncity▶ LIVE
Neoncity
Game
Strong on-brief cyberpunk drive: neon edge strips, cyan dashed lane markers, glowing windowed buildings and clean perspective read convincingly as a neon city. Weakness is the right side feels sparse and the buildings cluster asymmetrically, keeping it just shy of the best in field.
Neonracer▶ LIVE
Neonracer 🥈
Game
Strong synthwave aesthetic with clean cyan/magenta neon edges, glowing grid floor, and vivid dual-color vapor trails behind the car — the particle effects are the standout. Slightly generic car model and empty upper sky hold it just below the top mark.
Nordiccrypt▶ LIVE
Nordiccrypt
Game
Renders a functional first-person maze with warm torch glow, dust motes, minimap, and clean Nordic-styled UI, but the view is washed-out and flat — walls read as a beige grid without depth, stone/rune detail is barely visible, and it lacks the atmospheric contrast and dungeon menace the brief demands.
Outrun▶ LIVE
Outrun 🥇
Game
Gorgeous, on-brief execution — striped retro sun, parallax mountains, glowing pink/cyan rumble strips and lane markers on a proper pseudo-3D road, plus a detailed neon car and polished CRT scanline/vignette overlays. Speed reads 000 in the shot (idle), but the classic Jake-Gordon road engine with curves/hills and clean HUD makes this a task winner.
Pool▶ LIVE
Pool 🥈
Game
Clean 3D render with proper racked triangle, numbered/striped ball textures, cue stick aiming, pockets and power bar—clearly on-brief and polished. Solid physics-oriented setup but visually generic versus a task winner, and the shadowing under the rack looks a bit off.
Racing▶ LIVE
Racing
Game
Strong, clean third-person render with proper checkered finish line, striped curbs, low-poly trees, HUD, and a functioning minimap showing scattered obstacles; polished and shippable but the ring track and generic box-car keep it just short of the field's best.
Raycaster▶ LIVE
Raycaster
Game
Strong, shippable 3D maze: clean rendered walls with lighting/shadows, checkerboard floor, working minimap with player+goal markers, and solid controls/UI. Uses real 3D geometry rather than classic raycasting and looks a bit flat/plain (colored walls without texture), keeping it just shy of the top.
Rpg▶ LIVE
Rpg
Game
Clean render with a readable 3D top-down world—billboard player/slime sprites, HP bar, XP, inventory slots and clear controls all present and on-brief; but the scene feels sparse/empty with sprawling green fields, few obstacles and no visible combat action, keeping it solid-shippable rather than a task winner.
Skyrim▶ LIVE
Skyrim
Game
Clean low-poly open-world render with convincing terrain, water, trees, rocks, shadows, and dual controls for desktop/mobile — but it reads more generic 'nature explorer' than Skyrim (no snowy peaks visible, no fantasy/RPG flavor like structures, enemies, or quests), keeping it shippable but short of the task's best.
Twilightvale▶ LIVE
Twilightvale
Game
UI overlays (title, kills, weather, HP bar, hint) render correctly but the 3D scene is completely black — no terrain, trees, player, or enemies visible, meaning the WebGL world failed to render despite solid source code. A non-rendering core makes this effectively broken for a 3D RPG task.
Voxelcraft▶ LIVE
Voxelcraft
Game
Renders a working voxel world with clean hotbar, crosshair, HUD, and full FPS controls/raycast placement, but the terrain reads flat and washed-out (over-bright water plane dominating the view) and lacks the depth/polish of the best entry.
Landing▶ LIVE
Landing
Page
Strong dark SaaS aesthetic with polished feature cards, gradient accents, and animated particle canvas, but the large 3D torus renders as a full-screen overlay that collides badly with the headline and center card—hurting readability and looking unintentional in this viewport.
Webos▶ LIVE
Webos
Page
Renders a polished dark desktop with animated starfield bg, working dock, taskbar pills, mac-style window controls, drag/resize, and functional apps (Notes/Paint/Terminal/About visible). Strong and shippable but visually generic and only shows two overlapping windows—doesn't quite eclipse the field's best 9.0 in flair.
Blackhole▶ LIVE
Blackhole
Sim
The lensing shader and event horizon shadow are present with a visible photon ring, but the camera has zoomed far too close, blowing out the disk into overexposed white banding and blocky artifacts on the right that read as broken/glitchy rather than a clean gravitational-lensing visual. The core concept works but the framing and clipping make it look flawed rather than shippable.
Boids▶ LIVE
Boids
Sim
Strong 3D boids with proper flocking rules, orbit camera, live sliders, and a polished cage/grid/starfield presentation running at 60fps. Colorful cones read clearly but feel slightly sparse/scattered rather than showing tight emergent flocks in this frame, keeping it just short of the top.
Cloth▶ LIVE
Cloth
Sim
Strong Verlet cloth with visible draping folds over the sphere, clean UI and controls, and nice lighting/vertex-color tint. Weakness: the drape reads a bit shroud-like/pointy and the sphere obstacle is fully hidden, so the 'draping over an object' silhouette is less convincing than the best entries.
Fluid▶ LIVE
Fluid 🥈
Sim
Stunning rendered flow-field with rich swirling particle streaks, a clear vortex focal point, and vivid rainbow color mapping over additive-blended trails — genuinely beautiful and clearly on-brief. Only knock is the low 22fps and it's a flow-field trail sim rather than true fluid dynamics, but visually it tops the field.
Fractal▶ LIVE
Fractal
Sim
Renders a clean Julia set with full WebGL interactivity (pan/zoom/toggle/touch/autoplay) and a polished HUD; weak point is the flat pink-black palette which reads as low-detail compared to richer, banded fractal explorers, keeping it below the field's best.
Galaxy▶ LIVE
Galaxy 🥇
Sim
Beautiful multi-arm spiral with convincing color gradient (warm core to violet edges), bright glowing bulge, and background starfield—clearly on-brief and polished. Full swirl/orbit/zoom interactivity with a mouse-influence vortex on the particles makes this a task winner.
Orbit▶ LIVE
Orbit
Sim
UI chrome (title, control panel, hint bar) renders cleanly, but the entire 3D canvas is black — no star, bodies, trails, or starfield visible, so the actual N-body simulation fails to render. Bodies:7 counter confirms state exists but nothing draws, indicating a WebGL/camera setup failure.
Particleforge▶ LIVE
Particleforge
Sim
Renders a clean, colorful 9000-particle spherical cloud with a glowing gravity core, gradient title, palette swatches, and full interaction wiring (attract/repel/swirl/zoom/reset). Strong polish and on-brief, but the static screenshot reads as a fairly generic scattered blob without visible swirl or dramatic sculpting, keeping it just below the top tier.
Pathtracer▶ LIVE
Pathtracer 🥇
Sim
Renders a genuine progressively-converged Cornell box with correct red/green colored-wall bleed, a diffuse yellow sphere, and convincing glass and metal spheres showing refraction/reflection at 163 samples — physically-plausible and clearly on-brief. Minor grain and the truncated display material are the only weak points; strong direct-light NEE plus Russian-roulette path tracing make this a task winner.
Reactiondiff▶ LIVE
Reactiondiff
Sim
Strong GPU Gray-Scott sim rendering clean cell/worm Turing structures with a polished glassy control panel, presets, and interactive seeding; slight weakness is somewhat uniform blob patterns rather than more dramatic branching coral, keeping it just under the top.
Solar▶ LIVE
Solar
Sim
Screenshot shows only the 'Loading Solar System...' overlay on a black background — the actual scene never rendered (likely a CDN/init failure or the loading div was never hidden in the captured frame). Despite reasonable-looking source with orbits, labels, rings and controls, what renders is effectively non-functional.
Wormhole▶ LIVE
Wormhole
Sim
Screenshot captured mid-load showing only the 'ENTERING WORMHOLE...' splash on black — no tunnel visible, so the render fails to demonstrate the effect. The source is competent (TubeGeometry curve, scrolling rainbow texture, particles, mouse steering) but the 900ms loading overlay masked the actual scene at capture time, tanking the visible result.
Aurora▶ LIVE
Aurora
Visual
Only the AURORA title and hint text render over a black void — the Three.js scene (aurora, stars, mountains, moon) never appears, likely a fade overlay or render failure. The source is ambitious and well-structured but the actual result is essentially non-rendering.
Fireworks▶ LIVE
Fireworks
Visual
Strong 3D scene with starfield, skyline silhouette, additive-blended particle bursts and a polished shimmering title—clearly on-brief and shippable. Particles read slightly blocky/square rather than glowing sparks, and the depth composition feels a touch flat, keeping it just shy of the field's best.
Lavalamp▶ LIVE
Lavalamp
Visual
The 3D lamp tube renders with a nice warm bulb glow at the base and clean title styling, but the blobs read as faint, washed-out dim smudges rather than glowing morphing lava — the core wax visual is nearly invisible and underlit, leaving it feeling empty compared to the field's best.
Matrix▶ LIVE
Matrix 🥉
Visual
Strong, polished matrix rain with proper katakana glyphs, bright white leading heads, fading trails, and glow — plus solid extras (theme cycling, mouse disturbance, speed). Screenshot shows a captured cyan theme rather than classic green which slightly undercuts the iconic look, but it's clearly on-brief and shippable.
Plasma▶ LIVE
Plasma 🥉
Visual
Gorgeous smooth GLSL plasma with rich rainbow blobs, clean glowing title, and five well-styled palette swatches with clear active state; ripples aren't visible in the still but the code is solid, though the effect reads slightly generic against the very best field entry.
Synthwave▶ LIVE
Synthwave
Visual
Gorgeous, on-brief synthwave scene with striped sun, layered mountains, glowing neon title and a warm-to-purple gradient grid that reads beautifully; only knock is a visible rectangular artifact around the sun (the sky plane/glow seam) that slightly breaks the polish.
Terrain▶ LIVE
Terrain 🥉
Visual
Renders a clean, atmospheric procedural landscape with convincing height-based coloring (sand/grass/rock/snow), scattered trees, and fog depth; solid and shippable but the terrain reads somewhat soft/generic and lacks standout visual punch or striking peaks to top the field.
Voxel▶ LIVE
Voxel
Visual
Strong, polished voxel island with clean biome layering (sand/grass/stone/snow), soft shadows, trees, translucent water and clear orbit/zoom/WASD controls — very on-brief. Falls just short of the top: the terrain reads a bit flat/small and the stone plateau looks like a slightly noisy blob rather than dramatic peaks, but it's clearly shippable.
Waves▶ LIVE
Waves
Visual
Renders a convincing 3D sum-of-sines ocean with height-based foam coloring, floating buoys for scale, orbit/zoom controls, and a clean title overlay. The foam highlights are a bit blown-out and blobby and the specular hotspots read as over-bright white patches rather than crisp crests, keeping it just short of the field's best.
every demo, in a grid · click any one to play

Compare Claude Sonnet 5 against every other model

Every head-to-head featuring Claude Sonnet 5. Verdicts shown for scored pairs.

Claude Sonnet 5 vs Fusion
Fusion leads 33–7
Claude Sonnet 5 vs Hermes MoA
Hermes MoA leads 34–8
Claude Sonnet 5 vs Grok
Grok leads 22–15
Claude Sonnet 5 vs MiniMax M3
MiniMax M3 leads 23–17
Claude Sonnet 5 vs Fugu Ultra
Fugu Ultra leads 25–15
Claude Sonnet 5 vs GLM-5.2
Claude Sonnet 5 leads 22–19
Claude Sonnet 5 vs Fugu Mini
Fugu Mini leads 17–16
Claude Sonnet 5 vs Opus 4.8
Claude Sonnet 5 leads 25–16
Claude Sonnet 5 vs Kimi K2.7
Claude Sonnet 5 leads 11–7
Claude Sonnet 5 vs Qwable 5 27B Coder
Claude Sonnet 5 leads 25–15
Claude Sonnet 5 vs Qwen 3.7
Claude Sonnet 5 leads 31–11
Claude Sonnet 5 vs Qwythos 9B
Claude Sonnet 5 leads 39–0
Claude Sonnet 5 vs LongCat-2.0
LongCat-2.0 leads 4–0
Claude Sonnet 5 vs Gemma-4 12B Coder
Claude Sonnet 5 leads 4–1
Claude Sonnet 5 vs Kimi K2.7 · Fast
42 shared tasks · unscored
Claude Sonnet 5 vs Kimi K2.7 · No-Think
42 shared tasks · unscored
Claude Sonnet 5 vs Kimi K2.7 · Quality
42 shared tasks · unscored
Claude Sonnet 5 vs Ornith 1.0
42 shared tasks · unscored
Claude Sonnet 5 vs Claude Fable 5
Reference-only
Claude Sonnet 5 vs Claude Mythos 5
Reference-only
Claude Sonnet 5 vs Kilo Code
Reference-only

See all 66 comparisons across every model →

Quick pill index

Direct comparisons against every other scored model on the bench:

Claude Sonnet 5 vs Fusion Claude Sonnet 5 vs Hermes MoA Claude Sonnet 5 vs Grok Claude Sonnet 5 vs MiniMax M3 Claude Sonnet 5 vs Fugu Ultra Claude Sonnet 5 vs GLM-5.2 Claude Sonnet 5 vs Fugu Mini Claude Sonnet 5 vs Opus 4.8 Claude Sonnet 5 vs Kimi K2.7 Claude Sonnet 5 vs Qwable 5 27B Coder Claude Sonnet 5 vs Qwen 3.7 Claude Sonnet 5 vs Qwythos 9B Claude Sonnet 5 vs LongCat-2.0 Claude Sonnet 5 vs Gemma-4 12B Coder

Read more on agentos.guide: /sonnet-5-vs-glm-5-2

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$59/momonthly