Fifteen elite high-school competition math problems used as a yearly stress test for chain-of-thought reasoning.
The AIME is the qualifier for the US national math olympiad. Each problem demands creative algebra, geometry, number theory, or combinatorics, and the answer is always an integer between 0 and 999. Because the contest is brand new each year, a fresh AIME set is one of the cleanest tests of genuine mathematical reasoning — there is no chance the model saw the answers in training.
Models are asked to solve each problem and produce a single integer answer. Scoring is percent of correct integers. Most leaderboards run multiple samples per problem and report majority vote (pass@1 with self-consistency).
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | GPT-5.2 | OpenAI | Closed | 100.0 |
| 02 | Claude Sonnet 4.5 | Anthropic | Closed | 100.0 |
| 03 | Claude Opus 4.6 | Anthropic | Closed | 100.0 |
| 04 | Kimi K2.6 | Moonshot AI | Open | 96.4 |
| 05 | GLM-5 | Z.ai | Open | 95.8 |
| 06 | Kimi K2.5 | Moonshot AI | Open | 95.8 |
| 07 | GLM-5.1 | Z.ai |
7 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.
Older benchmarks like GSM8K and MATH are saturated and partially leaked into training data. AIME 2026 is released after training cutoffs for most current models, so the score is a clean read on reasoning.
Both are elite high-school competition math sets. HMMT is generally harder, with fewer per-problem guessing tricks. A model that scores 50% on AIME often scores 25–35% on HMMT.
Leaderboards vary. Some report single-shot accuracy, others report majority vote over 32 or 64 samples. Higher sample counts boost scores by 10–25 points on the same model, so check the methodology before comparing.
| Open |
| 95.3 |
| 08 | Gemini 3 Flash | Closed | 95.0 |
| 09 | Gemini 3 Pro | Closed | 95.0 |
| 10 | DeepSeek-V3.2 | DeepSeek | Open | 94.2 |
| 11 | Qwen3.6-27B | Alibaba | Open | 94.1 |
| 12 | GPT-5.1 | OpenAI | Closed | 94.0 |
| 13 | Qwen3.5-35B-A3B | Alibaba | Open | 93.3 |
| 14 | Qwen3.5-397B-A17B | Alibaba | Open | 93.3 |
| 15 | Qwen3.6 35B-A3B | Alibaba | Open | 92.7 |