Benchmarks · 2026

HMMT 2026: Harvard-MIT Mathematics Tournament, February 2026

Name: HMMT 2026: Harvard-MIT Mathematics Tournament, February 2026
Creator: Harvard-MIT Math Tournament, via MathArena
Published: 2026
Keywords: HMMT 2026, AI benchmark, text model evaluation, Harvard-MIT Math Tournament, via MathArena

Elite university-level competition math problems used as a 2026-fresh test of advanced reasoning.

Open Dataset

Models Tested

Top Score

92.7

Published

2026

Source

Harvard-MIT Math Tournament, via MathArena

How It Works

HMMT is one of the most prestigious high-school math tournaments in the world, written by undergraduates at Harvard and MIT. The problems are harder than AIME and typically demand deeper insight rather than computation. Because the 2026 set was released after training cutoffs for most current models, it is a clean stress test of reasoning under time pressure.

Models are asked to solve each problem and produce a single answer (integer or short expression). Scoring is percent of correct answers across the set, usually averaged over multiple samples with majority vote.

Dataset size

Approximately 30 problems from the February 2026 HMMT individual and team rounds.

Mean score

84.0

Median score

84.2

Open / Closed

12 / 0

Top Scorers

#	Model	Lab	Source	Score
01	Kimi K2.6	Moonshot AI	Open	92.7
02	Qwen3.5-397B-A17B	Alibaba	Open	87.9
03	Kimi K2.5	Moonshot AI	Open	87.1
04	GLM-5	Z.ai	Open	86.4
05	Nvidia Nemotron 3 Super	NVIDIA	Open	84.8
06	Qwen3.6-27B	Alibaba	Open	84.3
07	DeepSeek-V3.2

Score Distribution

Open vs Closed Source

Top Open-Source Models

1Kimi K2.692.7
2Qwen3.5-397B-A17B87.9
3Kimi K2.587.1

Top Closed-Source Models

No models in this category.

Score vs Parameter Count

Average Score by Lab

Moonshot AI
89.9n = 2
Z.ai
84.5n = 2
Alibaba
81.6n = 6

Most Correlated Benchmarks

GPQA
+0.61n = 12
Arena Score
+0.48n = 9
AIME 2026
+0.48n = 12
Terminal Bench
+0.44n = 11
HLE

What It Captures Well

Higher ceiling than AIME — harder problems leave more headroom for elite models.
Fresh each year, so memorization is not a factor.
Stress-tests novel problem-solving rather than recall.

Where It Falls Short

Small problem count means single problems swing scores noticeably.
Narrow domain — competition math only.
Sample-count and prompt variations change scores by 10+ points.

Frequently Asked Questions

How does HMMT compare to AIME?

HMMT is generally harder. AIME problems can sometimes be solved with one clever observation; HMMT problems usually need a chain of insights. Models tend to score 15–25 points lower on HMMT than on AIME of the same year.

Why use the February set specifically?

HMMT has two contests each year — November and February. The February set is the larger and harder of the two and is the one most labs report on for 2026.