Every text model we track, sorted lowest to highest score on the benchmark you pick. Open-source models in green, closed-source in teal. See who is ahead and watch the gap close as new open weights ship.
Last updated: May 7, 2026
Pick a benchmark
Open SourceClosed Source
Gap on GPQA:+2.7pts closed leads
Top Open Source
1Kimi K2.690.5
2DeepSeek-V4-Pro90.1
3Qwen3.5-397B-A17B88.4
Top Closed Source
1GPT-5.293.2
2GPT-5.492.8
3Gemini 3 Pro91.9
China vs US
China vs US Across All Benchmarks
The same models, plotted across every 0 to 100 benchmark we track. Each dot is a flag for the country the model was built in. Use it to spot where the gap is widest, where it has already closed, and which benchmarks the two ecosystems trade leadership on.
ChinaUnited States
Need Help Picking a Model?
We help teams ship the right open or closed model for the job, and the hardware to run it on.
Frequently Asked Questions
What Is the Open vs Closed AI Gap Tracker?
It is a free dot-chart tool that plots every text-generation AI model we follow against a single benchmark, with open-source models colored green and closed-source flagships colored teal. The chart sorts ascending so the score gap between open weights and closed APIs is easy to read at a glance, on every benchmark we track.
How Is a Model Classified as Open or Closed Source?
Each model in our reference database carries an isClosedSource flag. Closed-source means the weights are not publicly distributed and the model is only accessible through an API or first-party product, like GPT-5, Claude Opus, or Gemini. Open-source means the weights are downloadable and runnable locally, like Llama, Qwen, Mistral, and DeepSeek releases.
Where Do the Benchmark Scores Come From?
Scores come from the official leaderboards or papers for each benchmark. LM Arena is synced automatically. Academic and research benchmarks like GPQA, MMLU-PRO, GSM8K, SWE-bench, HLE, AIME, HMMT, Terminal Bench, EvasionBench, and olmOCR are entered into our reference model database when vendors or labs publish a number, then refreshed when scores update. Click a benchmark in the explainer to open the source dataset.
Why Are Some Benchmarks Missing Models?
A model only appears on a benchmark if the lab or vendor has published a score for it. Frontier labs report different subsets, and many open-source teams skip the niche benchmarks. The chart hides any benchmark with fewer than four scoring models so the visualization stays meaningful instead of showing two lonely dots.
How Often Does the Data Update?
LM Arena scores sync automatically on a recurring schedule, so the LM Arena chart reflects the public leaderboard within hours of a refresh. Other benchmarks update whenever a vendor or research group publishes a new result and our team enters it into the reference database. The Last Updated label in the hero shows the latest sync timestamp across all text benchmarks.
Can I Share or Download the Charts?
Yes. Use the Share button in the hero to copy the page URL and post it anywhere. To save a specific benchmark as a still image, use the Download as Image button next to the chart. The exported PNG includes the benchmark title, the open and closed source legend, the dot chart, and a watermark linking back to the page so attribution stays intact.
Why Does the Chart Sort Ascending?
Sorting from lowest to highest score makes the gap between open-source and closed-source models pop out visually. The eye reads the chart left to right and watches green dots climb. The longer the closed-source teal cluster sits at the top of the chart with no green nearby, the bigger the gap on that benchmark, and the more interesting the next open release becomes.
The Open vs Closed AI Gap Tracker plots every text model we follow against a benchmark of your choice. Open-source models render in green, closed-source flagships in teal, and dots are sorted from lowest to highest score so the gap is obvious at a glance. Below is what each benchmark on the page actually measures.
Arena.ai head-to-head ranking for models that read PDFs, slides, and long screenshots to answer questions, normalized to 0-100.
Scores are sourced from the published leaderboards or papers for each benchmark and synced into the Made By Agents reference model database. Closed-source vendors that do not publish a number for a given benchmark show up only on the benchmarks they do report.