Question 1

What Is the Open vs Closed AI Gap Tracker?

Accepted Answer

It is a free dot-chart tool that plots every text-generation AI model we follow against a single benchmark, with open-source models colored green and closed-source flagships colored teal. The chart sorts ascending so the score gap between open weights and closed APIs is easy to read at a glance, on every benchmark we track.

Question 2

How Is a Model Classified as Open or Closed Source?

Accepted Answer

Each model in our reference database carries an isClosedSource flag. Closed-source means the weights are not publicly distributed and the model is only accessible through an API or first-party product, like GPT-5, Claude Opus, or Gemini. Open-source means the weights are downloadable and runnable locally, like Llama, Qwen, Mistral, and DeepSeek releases.

Question 3

Where Do the Benchmark Scores Come From?

Accepted Answer

Scores come from the official leaderboards or papers for each benchmark. LM Arena is synced automatically. Academic and research benchmarks like GPQA, MMLU-PRO, GSM8K, SWE-bench, HLE, AIME, HMMT, Terminal Bench, EvasionBench, and olmOCR are entered into our reference model database when vendors or labs publish a number, then refreshed when scores update. Click a benchmark in the explainer to open the source dataset.

Question 4

Why Are Some Benchmarks Missing Models?

Accepted Answer

A model only appears on a benchmark if the lab or vendor has published a score for it. Frontier labs report different subsets, and many open-source teams skip the niche benchmarks. The chart hides any benchmark with fewer than four scoring models so the visualization stays meaningful instead of showing two lonely dots.

Question 5

How Often Does the Data Update?

Accepted Answer

LM Arena scores sync automatically on a recurring schedule, so the LM Arena chart reflects the public leaderboard within hours of a refresh. Other benchmarks update whenever a vendor or research group publishes a new result and our team enters it into the reference database. The Last Updated label in the hero shows the latest sync timestamp across all text benchmarks.

Question 6

Can I Share or Download the Charts?

Accepted Answer

Yes. Use the Share button in the hero to copy the page URL and post it anywhere. To save a specific benchmark as a still image, use the Download as Image button next to the chart. The exported PNG includes the benchmark title, the open and closed source legend, the dot chart, and a watermark linking back to the page so attribution stays intact.

Question 7

Why Does the Chart Sort Ascending?

Accepted Answer

Sorting from lowest to highest score makes the gap between open-source and closed-source models pop out visually. The eye reads the chart left to right and watches green dots climb. The longer the closed-source teal cluster sits at the top of the chart with no green nearby, the bigger the gap on that benchmark, and the more interesting the next open release becomes.

How Big Is the Gap on Each Benchmark?

What These Benchmarks Mean

The Benchmarks

GPQA

MMLU-PRO

GSM8K

SWE-Verified

HLE

AIME 2026

Terminal Bench

SWE-Pro

EvasionBench

olmOCR

HMMT 2026

Arena Score

China vs US Across All Benchmarks

Frequently Asked Questions

Need Help Picking a Model?