All Alibaba (Qwen) AI models
Maker of Qwen 3.7 — open-weights, multilingual, strong on Chinese reasoning.
My take on Alibaba (Qwen)
Alibaba's Qwen 3.7 is the other open-weights frontier model on the bench (alongside GLM-5.2). Strong on multilingual tasks (Chinese especially) and best-in-class fluid simulation in the Goldie Bench sample. Smaller sample size (5 scored tasks) means the average is provisional — I'll keep adding.
Where I use Alibaba (Qwen) inside the Agent OS. Each model below has a "How I use it" line in its detail page — that's the daily-usage view, not the marketing pitch.
Every Alibaba (Qwen) model on Goldie Bench
Click any card for the full model card, every demo, and direct head-to-head comparisons.
How I tested Alibaba (Qwen)'s models
Every model on this page received the exact same fixed prompt as every other model on the bench. One shot, single HTML file out, scored 0–10 by me on three axes (runs, hits the brief, looks good). The scoring is published in my source comparison guides on agentos.guide — see the methodology page for full data provenance.
Vendor: qwenlm.ai ↗
Run this stack yourself.
Every demo on this bench was built inside the Agent Operating System — one prompt, one shot, single HTML file out. The Agent OS, the prompts, the templates, the weekly walkthroughs and 3,600+ founders shipping with it every day all live inside the AI Profit Boardroom.