Sakana AI

Fugu Ultra

Sakana's multi-agent answer to Fusion — frontier ensemble without single-vendor risk.

Context272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates.

Pricing$5 / 1M input · $30 / 1M output (Fugu Ultra)

Tasks tested5

Avg score7.60/10 average

Medals🥇3 🥈1 🥉0

Release2026-06-15

Reference benchmarks for Fugu Ultra

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for Fugu Ultra is honest about what's measured.

SWE Bench Pro

73.7

source: /sakana.ai/fugu

GPQA-Diamond

95.5

source: /sakana.ai/fugu

MRCRv2

93.6

source: /sakana.ai/fugu

What is Fugu Ultra?

Fugu Ultra is the Sakana AI frontier model with a 272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates. context window, released 2026-06-15. Tagline: Sakana's multi-agent answer to Fusion — frontier ensemble without single-vendor risk..

Pricing detail. Sakana's multi-agent orchestration: a single API call internally dispatches to multiple frontier models and synthesises the answer. Subscription plans run $20-$200/mo (Standard / Pro / Max); PAYG is $5/M input + $30/M output for Fugu Ultra. Direct competitor to OpenRouter Fusion's panel approach.

How I use it inside the Agent OS. Dispatched from Agent OS as the panel-ensemble alternative to OpenRouter Fusion. Bench scored by Claude judge against the same 42 prompts as every other model.

What I built with Fugu Ultra

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what Fugu Ultra shipped on the bench: 5 one-shot demos across 272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates. of context. Of those, 5 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

SWE Bench Pro 73.7 · GPQA-D 95.5 · MRCRv2 93.6 — Sakana's published frontier-tier benchmark scores
Vendor-agnostic ensemble — opt out of specific providers for compliance / export-control
OpenAI-compatible API at api.sakana.ai — drop-in for existing tooling

Trade-offs

Panel orchestration adds latency — even a 'pong' burns ~2k orchestration tokens
Newer than Fusion; less community calibration on long-tail prompts

Best for

Teams that want Fusion-class quality but need a different vendor risk profile
Operators avoiding export-controlled providers (Sakana emphasises this in their pitch)
Deep-research workflows where ensemble verdicts beat single-model answers

Every demo by Fugu Ultra

5 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

▶ LIVE

Raycaster 🥇

Game

26KB canvas raycaster with WASD + mouse-look + distance fog + weapon bob. Clean implementation, comparable to Fusion's 17KB on the same prompt. ~$0.35 per call — roughly 1/4 the cost of Fusion.

▶ LIVE

Landing 🥇

Page

Sakana Fugu Ultra shipped a 32KB Apple-keynote landing — bigger than Fusion's 20KB attempt at the same prompt. Animated mesh gradient, multi-section, polished. $0.32 vs Fusion's $1.30 for the same output — 4× cheaper, denser result.

▶ LIVE

Galaxy 🥇

Sim

26KB three.js spiral galaxy with drag-to-orbit + dust lanes + bloom. Comparable visual quality to Fusion's 14KB attempt with more polish on the camera UI. ~$0.24 per call.

▶ LIVE

Orbit 🥈

Sim

26KB inner-solar-system orbit map with a glassmorphic info panel, kicker badge, blurred backdrop, hover cards. Cleaner UI than Fusion's same-task attempt — beats it on polish.

▶ LIVE

Voxel

Visual

TRUNCATED — Fugu hit my 16K max_tokens ceiling mid-particle-spawn function. The 28KB it shipped has rich setup (HUD, lane geometry, obstacle/coin spawning, particle pool) but NO animation loop and NO input handlers, plus an unclosed <script> tag. Loads to a static scene; doesn't play. Penalised for shipping a non-runnable artefact. Fusion's voxel attempt on the same prompt: 19KB but complete and playable.

Compare Fugu Ultra against every other model

Every head-to-head featuring Fugu Ultra. Verdicts shown for scored pairs.

Fugu Ultra vs Opus 4.8

Opus 4.8 leads 2–1

Fugu Ultra vs GLM-5.2

Fugu Ultra leads 3–1

Fugu Ultra vs Grok

Fugu Ultra leads 2–1

Fugu Ultra vs Fusion

Tied 1–1

Fugu Ultra vs MiniMax M3

Fugu Ultra leads 3–1

Fugu Ultra vs Qwen 3.7

Fugu Ultra leads 2–1

Fugu Ultra vs Kimi K2.7

Fugu Ultra leads 3–1

Fugu Ultra vs Fugu Mini

Fugu Ultra leads 1–0

Fugu Ultra vs Gemma-4 12B Coder

Fugu Ultra leads 2–0

Fugu Ultra vs Kimi K2.7 · Fast

1 shared tasks · unscored

Fugu Ultra vs Kimi K2.7 · No-Think

1 shared tasks · unscored

Fugu Ultra vs Kimi K2.7 · Quality

1 shared tasks · unscored

Fugu Ultra vs Claude Fable 5

Reference-only

Fugu Ultra vs Claude Mythos 5

Reference-only

Fugu Ultra vs Kilo Code

Reference-only

See all 66 comparisons across every model →

Quick pill index

Direct comparisons against every other scored model on the bench:

Fugu Ultra vs Opus 4.8 Fugu Ultra vs GLM-5.2 Fugu Ultra vs Grok Fugu Ultra vs Fusion Fugu Ultra vs MiniMax M3 Fugu Ultra vs Qwen 3.7 Fugu Ultra vs Kimi K2.7 Fugu Ultra vs Fugu Mini Fugu Ultra vs Gemma-4 12B Coder

Read more on agentos.guide: /sakana-fugu-vs-fusion

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$100k+/mocommunity MRR

Join AIPB · $59/mo → Read the Agent OS guides →