Sakana AI

Fugu Ultra

Sakana's multi-agent answer to Fusion — frontier ensemble without single-vendor risk.

Context272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates.
Pricing$5 / 1M input · $30 / 1M output (Fugu Ultra)
Tasks tested5
Avg score7.60/10 average
Medals🥇3 🥈1 🥉0
Release2026-06-15

Reference benchmarks for Fugu Ultra

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for Fugu Ultra is honest about what's measured.

SWE Bench Pro
73.7
GPQA-Diamond
95.5
MRCRv2
93.6

What is Fugu Ultra?

Fugu Ultra is the Sakana AI frontier model with a 272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates. context window, released 2026-06-15. Tagline: Sakana's multi-agent answer to Fusion — frontier ensemble without single-vendor risk..

Pricing detail. Sakana's multi-agent orchestration: a single API call internally dispatches to multiple frontier models and synthesises the answer. Subscription plans run $20-$200/mo (Standard / Pro / Max); PAYG is $5/M input + $30/M output for Fugu Ultra. Direct competitor to OpenRouter Fusion's panel approach.

How I use it inside the Agent OS. Dispatched from Agent OS as the panel-ensemble alternative to OpenRouter Fusion. Bench scored by Claude judge against the same 42 prompts as every other model.

What I built with Fugu Ultra

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what Fugu Ultra shipped on the bench: 5 one-shot demos across 272,000 tokens with the standard rate. Calls exceeding 272K context are billed at the higher 'long-context' rates. of context. Of those, 5 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

  • SWE Bench Pro 73.7 · GPQA-D 95.5 · MRCRv2 93.6 — Sakana's published frontier-tier benchmark scores
  • Vendor-agnostic ensemble — opt out of specific providers for compliance / export-control
  • OpenAI-compatible API at api.sakana.ai — drop-in for existing tooling

Trade-offs

  • Panel orchestration adds latency — even a 'pong' burns ~2k orchestration tokens
  • Newer than Fusion; less community calibration on long-tail prompts

Best for

  • Teams that want Fusion-class quality but need a different vendor risk profile
  • Operators avoiding export-controlled providers (Sakana emphasises this in their pitch)
  • Deep-research workflows where ensemble verdicts beat single-model answers

Every demo by Fugu Ultra

5 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$100k+/mocommunity MRR