Meituan

LongCat-2.0

The open 1.6T MoE that builds — a frontier coder trained on non-Nvidia ASIC superpods.

Context1,000,000 tokens (LongCat Sparse Attention)
PricingOpen weights · free web chat · API
Tasks tested4
Avg score8.12/10 average
Medals🥇0 🥈1 🥉2
Release2026-06
Official sitelongcat.chat ↗
Official vendor source
LongCat-2.0 is built by Meituan — see the vendor's own product page, pricing, and docs at longcat.chat.
Visit longcat.chat →

Reference benchmarks for LongCat-2.0

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for LongCat-2.0 is honest about what's measured.

Terminal-Bench 2.1
70.8
source: /longcat-2-0
SWE-bench Multilingual
77.3
source: /longcat-2-0
BrowseComp
79.9
source: /longcat-2-0
GPQA-diamond
88.9
source: /longcat-2-0
IFEval
90.0
source: /longcat-2-0

What is LongCat-2.0?

LongCat-2.0 is the Meituan frontier model with a 1,000,000 tokens (LongCat Sparse Attention) context window, released 2026-06. Tagline: The open 1.6T MoE that builds — a frontier coder trained on non-Nvidia ASIC superpods.. Official source: longcat.chat.

Pricing detail. LongCat-2.0 is open-sourced (weights on Hugging Face + GitHub) and served via the longcat.chat web chat plus an OpenAI-compatible API (model id 'LongCat-2.0' at api.longcat.chat/openai/v1). It's a 1.6T-parameter MoE with ~48B activated per token, trained entirely on AI ASIC superpods (>50K accelerators, 35T+ tokens, no rollbacks). Note: the direct API key we were handed shipped with zero token quota ('Token 额度不足'), so every build here was run through the free web chat. Vendor: Meituan.

How I use it inside the Agent OS. Run through the free longcat.chat web chat (the API key had no token quota), driven with the local-model-tester GoldieBench prompts; every build render-verified + playtested (verify-move.js: walks + looks + zero errors) before scoring. Slots into the Agent OS as an open frontier coder via its OpenAI-compatible API or the Claude Code / OpenClaw / Hermes harnesses.

What I built with LongCat-2.0

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what LongCat-2.0 shipped on the bench: 4 one-shot demos across 1,000,000 tokens (LongCat Sparse Attention) of context. Of those, 4 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

  • One-shot GoldieBench: 3 of 4 flawless playable 3D builds (Dragon Realm 8.5, Skyrim 8.5, Crypt 8.0); Voxel Craft built one-shot but needed a 1-line camera fix (7.5) — avg 8.1
  • 1.6T-param MoE (~48B active/token) with LongCat Sparse Attention + a 1M-token window — built for long-horizon agentic + coding tasks
  • Open weights, deeply integrated with Claude Code, OpenClaw and Hermes — a free frontier-class coder to slot into the Agent OS

Trade-offs

  • The direct API key we were given had near-zero token quota, so we ran it through the free web chat rather than the API
  • One camera-framing miss: Voxel Craft loaded facing away from the world (sky-only) until a one-line yaw/pitch patch pointed it at the terrain

Best for

  • One-shot single-file 3D / HTML / game builds inside the Agent OS
  • Long-context, repo-level edits + automated agentic task execution
  • A free, open, frontier-class coder to drop into the Model-Proof System

Compare LongCat-2.0 against every other model

Every head-to-head featuring LongCat-2.0. Verdicts shown for scored pairs.

LongCat-2.0 vs Fusion
Fusion leads 4–0
LongCat-2.0 vs Hermes MoA
LongCat-2.0 leads 3–1
LongCat-2.0 vs Grok
Grok leads 1–0
LongCat-2.0 vs MiniMax M3
MiniMax M3 leads 3–0
LongCat-2.0 vs Fugu Ultra
LongCat-2.0 leads 3–1
LongCat-2.0 vs GLM-5.2
LongCat-2.0 leads 2–1
LongCat-2.0 vs Fugu Mini
LongCat-2.0 leads 2–0
LongCat-2.0 vs Opus 4.8
LongCat-2.0 leads 3–1
LongCat-2.0 vs Kimi K2.7
4 shared tasks · unscored
LongCat-2.0 vs Claude Sonnet 5
LongCat-2.0 leads 4–0
LongCat-2.0 vs Qwable 5 27B Coder
LongCat-2.0 leads 4–0
LongCat-2.0 vs Qwen 3.7
LongCat-2.0 leads 3–0
LongCat-2.0 vs Qwythos 9B
LongCat-2.0 leads 4–0
LongCat-2.0 vs Gemma-4 12B Coder
4 shared tasks · unscored
LongCat-2.0 vs Kimi K2.7 · Fast
4 shared tasks · unscored
LongCat-2.0 vs Kimi K2.7 · No-Think
4 shared tasks · unscored
LongCat-2.0 vs Kimi K2.7 · Quality
4 shared tasks · unscored
LongCat-2.0 vs Ornith 1.0
4 shared tasks · unscored
LongCat-2.0 vs Claude Fable 5
Reference-only
LongCat-2.0 vs Claude Mythos 5
Reference-only
LongCat-2.0 vs Kilo Code
Reference-only

See all 66 comparisons across every model →

Quick pill index

Direct comparisons against every other scored model on the bench:

LongCat-2.0 vs Fusion LongCat-2.0 vs Hermes MoA LongCat-2.0 vs Grok LongCat-2.0 vs MiniMax M3 LongCat-2.0 vs Fugu Ultra LongCat-2.0 vs GLM-5.2 LongCat-2.0 vs Fugu Mini LongCat-2.0 vs Opus 4.8 LongCat-2.0 vs Kimi K2.7 LongCat-2.0 vs Claude Sonnet 5 LongCat-2.0 vs Qwable 5 27B Coder LongCat-2.0 vs Qwen 3.7 LongCat-2.0 vs Qwythos 9B LongCat-2.0 vs Gemma-4 12B Coder

Read more on agentos.guide: /longcat-2-0

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders
258documented wins
38countries
$59/momonthly