Moonshot AI

Kimi K2.7

The heavy lifter — frontier coder at flat-rate.

Context256,000 tokens

PricingFlat plan (no per-token bill)

Tasks tested47

Avg score7.46/10 average

Medals🥇1 🥈2 🥉0

Release2026-06

Official sitekimi.com ↗

Official vendor source

Kimi K2.7 is built by Moonshot AI — see the vendor's own product page, pricing, and docs at kimi.com.

Visit kimi.com →

Reference benchmarks for Kimi K2.7

These are external benchmarks I pulled from the source comparison guides on agentos.guide — SWE-bench Verified, DRACO, Kilo plan rubric, build-time measurements, vendor-reported coding scores. They are not goldiebench medal scores (those come only from same-prompt one-shot creative coding tasks in the matrix). I surface them here so the spec sheet for Kimi K2.7 is honest about what's measured.

SWE-bench Verified (K2.6 — K2.7 internal)

80.2%

source: /three-dragons

Cost

Pennies per build

source: /three-dragons

What is Kimi K2.7?

Kimi K2.7 is the Moonshot AI frontier model with a 256,000 tokens context window, released 2026-06. Tagline: The heavy lifter — frontier coder at flat-rate.. Official source: kimi.com.

Pricing detail. Available on Moonshot's flat-rate subscription plan — no per-token billing for individual builders. The plan covers all three speed modes (Fast, No-Think, Quality). Vendor: Moonshot AI (moonshot.ai), based in Beijing.

How I use it inside the Agent OS. Wired into the Agent OS as the heavy-lifter for game/sim prototypes and Kanban-dispatched code work. Mode toggled per task: Quality for one-shot games, Fast for short bursts.

What I built with Kimi K2.7

Every model on Goldie Bench gets the same fixed prompt set — one shot, single HTML file out — and I score the result 0–10 inside the Agent Operating System. Here's what Kimi K2.7 shipped on the bench: 47 one-shot demos across 256,000 tokens of context. Of those, 25 are scored against the field with my honest 0–10 from the source guides at agentos.guide.

Strengths

Best-of-three on interactive games — raycaster, DOOM, monster AI
Three speed modes (Fast / No-Think / Quality) you can swap per task
Flat-rate plan eliminates the per-token meter, so iteration is free

Trade-offs

Plays plainest on abstract visual prompts — synthwave grids, fluid sims, aurora — where GLM and Opus add more flair
Bronze average on the Goldie Bench bench despite the gold-medal games — its visual builds are accurate but understated

Best for

Interactive game prototypes you want shippable on the first prompt
High-iteration agent loops where per-token cost would dominate
Long-context refactors using the 256K window inside Agent OS

Every benchmark — Kimi K2.7's full scorecard

All 25 scored tasks, best first — the judge's 0–10 on the same rubric as the whole field. Click any bar for that task's cross-model page, or open this scorecard in the interactive graphs. Full editorial breakdown with judge quotes and sourced outside research: the Kimi K2.7 deep dive →.

Every demo by Kimi K2.7

47 live demos, sorted by category. Click any tile to play the actual one-shot result. Verdicts and 0–10 scores are pulled from the source guides where I posted them publicly.

Kimi K2.7

Reference benchmarks for Kimi K2.7

What is Kimi K2.7?

What I built with Kimi K2.7

Strengths

Trade-offs

Best for

Every benchmark — Kimi K2.7's full scorecard

Every demo by Kimi K2.7

Compare Kimi K2.7 against every other model

Quick pill index

Kimi K2.7 — frequently asked

Run this stack yourself.