Elite university-level competition math problems used as a 2026-fresh test of advanced reasoning.
HMMT is one of the most prestigious high-school math tournaments in the world, written by undergraduates at Harvard and MIT. The problems are harder than AIME and typically demand deeper insight rather than computation. Because the 2026 set was released after training cutoffs for most current models, it is a clean stress test of reasoning under time pressure.
Models are asked to solve each problem and produce a single answer (integer or short expression). Scoring is percent of correct answers across the set, usually averaged over multiple samples with majority vote.
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | Kimi K2.6 | Moonshot AI | Open | 92.7 |
| 02 | Qwen3.5-397B-A17B | Alibaba | Open | 87.9 |
| 03 | Kimi K2.5 | Moonshot AI | Open | 87.1 |
| 04 | GLM-5 | Z.ai | Open | 86.4 |
| 05 | Nvidia Nemotron 3 Super | NVIDIA | Open | 84.8 |
| 06 | Qwen3.6-27B | Alibaba | Open | 84.3 |
| 07 | DeepSeek-V3.2 |
No models in this category.
HMMT is generally harder. AIME problems can sometimes be solved with one clever observation; HMMT problems usually need a chain of insights. Models tend to score 15–25 points lower on HMMT than on AIME of the same year.
HMMT has two contests each year — November and February. The February set is the larger and harder of the two and is the one most labs report on for 2026.
Based on score correlations across our database.
| DeepSeek |
| Open |
| 84.1 |
| 08 | Qwen3.6 35B-A3B | Alibaba | Open | 83.6 |
| 09 | GLM-5.1 | Z.ai | Open | 82.6 |
| 10 | Qwen3.5-35B-A3B | Alibaba | Open | 81.8 |
| 11 | Qwen3.5-27B | Alibaba | Open | 81.1 |
| 12 | Qwen3.5-9B | Alibaba | Open | 71.2 |