Best AI model for…

Best AI model for Blackhole

Q: What's the best AI model for Blackhole?

Fusion — Photon back-tracing through a curved-space metric (claims as much in the code) for actual gravitational lensing — disk's far side lifted over and under the shadow. Loading screen says "computing space-time metric…" Range-slider parameter panel for spin/disk tilt/exposure. Most ambitious blackhole attempt by code structure.

Q: How many AI models can do Blackhole?

23 models on the Goldie Bench bench have attempted Blackhole: Fusion, Kimi K3, Opus 4.8, Claude Opus 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Claude Fable 5, Fugu Ultra, Fugu Mini, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, MiniMax M3, Hermes MoA, Qwen 3.8, Qwen 3.7, Claude Sonnet 5.

Black Hole — gravitational lensing visualisation.

🥇

Best model for Blackhole

Fusion

"Photon back-tracing through a curved-space metric (claims as much in the code) for actual gravitational lensing — disk's far side lifted over and under the shadow. Loading screen says "computing space-time metric…" Range-slider parameter panel for spin/disk tilt/exposure. Most ambitious blackhole attempt by code structure."

See Fusion's full track record →

The prompt — what I asked each model

Every model on this page got the same fixed prompt inside the Agent Operating System: Black Hole — gravitational lensing visualisation.

Single HTML file out. No iteration. No "best of N." No examples in the system prompt. Whatever each model produced on the first run is what's on this page. 23 models have attempted it so far — Fusion, Kimi K3, Opus 4.8, Claude Opus 5, DeepSeek V4 Pro, DeepSeek V4 Flash, Claude Fable 5, Fugu Ultra, Fugu Mini, Gemini 3.6 Flash, GLM-5.2, GPT-5.6 Sol, Grok, Inkling, Kimi K2.7, Kimi K2.7 · Fast, Kimi K2.7 · No-Think, Kimi K2.7 · Quality, MiniMax M3, Hermes MoA, Qwen 3.8, Qwen 3.7, Claude Sonnet 5.

What counts as winning here

This is the sim category on Goldie Bench. The question isn't "did the model write code that compiles" — the question is "did the model ship a thing you'd actually use." For Blackhole that means three things, in order:

Does it run? Drop the .html file in a browser. If it opens to a broken page, it scores zero on the first axis.
Did it hit the brief? The prompt asks for a specific thing. A model that ships a different thing — however polished — gets docked on the brief axis.
Does it look good? Visual polish, motion, interactivity, attention to detail. This is where the difference between gold and silver usually lives.

Final score is my honest 0–10 across all three axes, averaged. Across the 18 models I've scored on this task so far, the average score is 7.45/10.

Every model's attempt — ranked by my 0–10 score

Models ranked by medal (highest score = 🥇 gold, second = 🥈 silver, third = 🥉 bronze). Click any tile to play that model's actual one-shot HTML.

Best AI model for Blackhole

The prompt — what I asked each model

What counts as winning here

Every model's attempt — ranked by my 0–10 score

How I tested this — the methodology in 60 seconds

Related tasks and comparisons

Run this stack yourself.