Benchmarks · 2024

MMMU-Pro: MMMU-Pro Massive Multi-discipline Multimodal Understanding

Name: MMMU-Pro: MMMU-Pro Massive Multi-discipline Multimodal Understanding
Creator: Tsinghua University, Berkeley, and CMU
Published: 2024
Keywords: MMMU-Pro, AI benchmark, text model evaluation, Tsinghua University, Berkeley, and CMU

Hard image-plus-text reasoning across 30 college subjects, the multimodal counterpart to MMLU-Pro.

Open Dataset Read Paper

Models Tested

Top Score

—

Published

2024

Source

Tsinghua University, Berkeley, and CMU

How It Works

MMMU-Pro is the standard test of college-level multimodal reasoning. Every question requires reading a diagram, chart, photo, or scientific figure alongside the text. The Pro version is a harder, more contamination-resistant variant of the original MMMU benchmark.

Multiple-choice questions with one correct answer. Models must read the image and text together to answer. The Pro variant strips text-only solvable items and adds harder distractors.

Dataset size

Around 3,500 college-level multimodal questions across 30 subjects, each requiring an image to answer.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Cleanest multimodal benchmark across subjects.
Closed-book reasoning that genuinely needs vision.
Tests both expert knowledge and image understanding.

Where It Falls Short

Requires a vision-capable model.
Multiple choice masks reasoning quality.
Some subjects are noisier than others.

Frequently Asked Questions

Is MMMU-Pro the same as MMMU?

No. MMMU-Pro is the harder, cleaner variant that drops text-only solvable questions and adds harder distractors. Use it for frontier multimodal models.

Related Benchmarks

Based on score correlations across our database.

Pearson r —

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Benchmarks · 2024

MMMU-Pro: MMMU-Pro Massive Multi-discipline Multimodal Understanding

Hard image-plus-text reasoning across 30 college subjects, the multimodal counterpart to MMLU-Pro.

Open Dataset Read Paper

Models Tested

Top Score

—

Published

2024

Source

Tsinghua University, Berkeley, and CMU

How It Works

Multiple-choice questions with one correct answer. Models must read the image and text together to answer. The Pro variant strips text-only solvable items and adds harder distractors.

Dataset size

Around 3,500 college-level multimodal questions across 30 subjects, each requiring an image to answer.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Cleanest multimodal benchmark across subjects.
Closed-book reasoning that genuinely needs vision.
Tests both expert knowledge and image understanding.

Where It Falls Short

Requires a vision-capable model.
Multiple choice masks reasoning quality.
Some subjects are noisier than others.

Frequently Asked Questions

Is MMMU-Pro the same as MMMU?

No. MMMU-Pro is the harder, cleaner variant that drops text-only solvable questions and adds harder distractors. Use it for frontier multimodal models.

Related Benchmarks

Based on score correlations across our database.

Pearson r —

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

MMMU-Pro: MMMU-Pro Massive Multi-discipline Multimodal Understanding

How It Works

Top Scorers

Score Distribution

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Frequently Asked Questions

Related Benchmarks

GPQA

MMLU-PRO

GSM8K

SWE-Verified

The AI Build Report

MMMU-Pro: MMMU-Pro Massive Multi-discipline Multimodal Understanding

How It Works

Top Scorers

Score Distribution

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Frequently Asked Questions

Related Benchmarks

GPQA

MMLU-PRO

GSM8K

SWE-Verified

The AI Build Report