Meituan

LongCat-2.0

The open 1.6T MoE that builds — a frontier coder trained on non-Nvidia ASIC superpods.

Context1,000,000 tokens (LongCat Sparse Attention)

PricingOpen weights · free web chat · API

Tasks tested4

Avg score8.12/10 average

Medals🥇0 🥈1 🥉2

Release2026-06

Official sitelongcat.chat ↗

Official vendor source

LongCat-2.0 is built by Meituan — see the vendor's own product page, pricing, and docs at longcat.chat.

Visit longcat.chat →

Reference benchmarks for LongCat-2.0

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for LongCat-2.0 is honest about what's measured.

Terminal-Bench 2.1

70.8

source: /longcat-2-0

SWE-bench Multilingual

77.3

source: /longcat-2-0

BrowseComp

79.9

source: /longcat-2-0

GPQA-diamond

88.9

source: /longcat-2-0

IFEval

90.0

source: /longcat-2-0

What is LongCat-2.0?

LongCat-2.0 is the Meituan frontier model with a 1,000,000 tokens (LongCat Sparse Attention) context window, released 2026-06. Tagline: The open 1.6T MoE that builds — a frontier coder trained on non-Nvidia ASIC superpods.. Official source: longcat.chat.

Pricing detail. LongCat-2.0 is open-sourced (weights on Hugging Face + GitHub) and served via the longcat.chat web chat plus an OpenAI-compatible API (model id 'LongCat-2.0' at api.longcat.chat/openai/v1). It's a 1.6T-parameter MoE with ~48B activated per token, trained entirely on AI ASIC superpods (>50K accelerators, 35T+ tokens, no rollbacks). Note: the direct API key we were handed shipped with zero token quota ('Token 额度不足'), so every build here was run through the free web chat. Vendor: Meituan.

How I use it inside the Agent OS. Run through the free longcat.chat web chat (the API key had no token quota), driven with the local-model-tester GoldieBench prompts; every build render-verified + playtested (verify-move.js: walks + looks + zero errors) before scoring. Slots into the Agent OS as an open frontier coder via its OpenAI-compatible API or the Claude Code / OpenClaw / Hermes harnesses.

What I built with LongCat-2.0

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what LongCat-2.0 shipped on the bench: 4 one-shot demos across 1,000,000 tokens (LongCat Sparse Attention) of context. Of those, 4 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

One-shot GoldieBench: 3 of 4 flawless playable 3D builds (Dragon Realm 8.5, Skyrim 8.5, Crypt 8.0); Voxel Craft built one-shot but needed a 1-line camera fix (7.5) — avg 8.1
1.6T-param MoE (~48B active/token) with LongCat Sparse Attention + a 1M-token window — built for long-horizon agentic + coding tasks
Open weights, deeply integrated with Claude Code, OpenClaw and Hermes — a free frontier-class coder to slot into the Agent OS

Trade-offs

The direct API key we were given had near-zero token quota, so we ran it through the free web chat rather than the API
One camera-framing miss: Voxel Craft loaded facing away from the world (sky-only) until a one-line yaw/pitch patch pointed it at the terrain

Best for

One-shot single-file 3D / HTML / game builds inside the Agent OS
Long-context, repo-level edits + automated agentic task execution
A free, open, frontier-class coder to drop into the Model-Proof System

Every demo by LongCat-2.0

4 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

▶ LIVE

Crypt 🥉

Game

One-shot 9KB torch-lit stone dungeon corridor — pillars, barrels, a chest, 6+ flickering torch PointLights, fog. Real WASD+mouse controls. verify-move: walks+looks, 0 errors. Lit + atmospheric (a touch over-bright orange).

▶ LIVE

Dragonrealm 🥉

Game

One-shot 15KB three.js snow open-world — snow-capped mountains + 30 low-poly pines, 3000-particle falling snow, first-person glowing sword, fog. Real WASD+mouse+sprint controls, terrain-follow. verify-move: walks+looks, canvas 1440x810, 0 errors. Flawless first try — no patch.

▶ LIVE

Skyrim 🥈

Game

One-shot 23KB open-world explorer (the richest of the four) — rolling displaced terrain, snow mountains, a stone watchtower, 20+ conifers, boulders, grass, clouds, and terrain-height following. Real WASD+mouse. verify-move: walks+looks, 0 errors.

▶ LIVE

Voxelcraft

Game

One-shot 9KB Minecraft-style voxel world — 16x16 grass/dirt/stone cubes, voxel trees, day/night sky, raycast break+place, real WASD+mouse. verify-move: walks+looks, 0 errors. Built the full world one-shot but the initial camera yaw faced away (sky-only) — a one-line framing patch (yaw 0, pitch -0.5) pointed it at the terrain.

every demo, in a grid · click any one to play

Compare LongCat-2.0 against every other model

Every head-to-head featuring LongCat-2.0. Verdicts shown for scored pairs.

LongCat-2.0 vs Fusion

Fusion leads 4–0

LongCat-2.0 vs Hermes MoA

LongCat-2.0 leads 3–1

LongCat-2.0 vs Grok

Grok leads 1–0

LongCat-2.0 vs MiniMax M3

MiniMax M3 leads 3–0

LongCat-2.0 vs Fugu Ultra

LongCat-2.0 leads 3–1

LongCat-2.0 vs GLM-5.2

LongCat-2.0 leads 2–1

LongCat-2.0 vs Fugu Mini

LongCat-2.0 leads 2–0

LongCat-2.0 vs Opus 4.8

LongCat-2.0 leads 3–1

LongCat-2.0 vs Kimi K2.7

4 shared tasks · unscored

LongCat-2.0 vs Claude Sonnet 5

LongCat-2.0 leads 4–0

LongCat-2.0 vs Qwable 5 27B Coder

LongCat-2.0 leads 4–0

LongCat-2.0 vs Qwen 3.7

LongCat-2.0 leads 3–0

LongCat-2.0 vs Qwythos 9B

LongCat-2.0 leads 4–0

LongCat-2.0 vs Gemma-4 12B Coder

4 shared tasks · unscored

LongCat-2.0 vs Kimi K2.7 · Fast

4 shared tasks · unscored

LongCat-2.0 vs Kimi K2.7 · No-Think

4 shared tasks · unscored

LongCat-2.0 vs Kimi K2.7 · Quality

4 shared tasks · unscored

LongCat-2.0 vs Ornith 1.0

4 shared tasks · unscored

LongCat-2.0 vs Claude Fable 5

Reference-only

LongCat-2.0 vs Claude Mythos 5

Reference-only

LongCat-2.0 vs Kilo Code

Reference-only

See all 66 comparisons across every model →

Quick pill index

Direct comparisons against every other scored model on the bench:

LongCat-2.0 vs Fusion LongCat-2.0 vs Hermes MoA LongCat-2.0 vs Grok LongCat-2.0 vs MiniMax M3 LongCat-2.0 vs Fugu Ultra LongCat-2.0 vs GLM-5.2 LongCat-2.0 vs Fugu Mini LongCat-2.0 vs Opus 4.8 LongCat-2.0 vs Kimi K2.7 LongCat-2.0 vs Claude Sonnet 5 LongCat-2.0 vs Qwable 5 27B Coder LongCat-2.0 vs Qwen 3.7 LongCat-2.0 vs Qwythos 9B LongCat-2.0 vs Gemma-4 12B Coder

Read more on agentos.guide: /longcat-2-0

The same stack Julian uses

Run this stack yourself.

Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.

3,600+founders

258documented wins

38countries

$59/momonthly

Join AIPB · $59/mo → Read the Agent OS guides →