Benchmarks · 2025

olmOCR: olmOCR Document Understanding Benchmark

Name: olmOCR: olmOCR Document Understanding Benchmark
Creator: Allen Institute for AI
Published: 2025
Keywords: olmOCR, AI benchmark, text model evaluation, Allen Institute for AI

Fourteen hundred real PDFs that test whether a model can turn messy documents into clean, structured markdown.

Open Dataset

Models Tested

Top Score

—

Published

2025

Source

Allen Institute for AI

How It Works

olmOCR is the strongest public test of document-to-markdown ability. The model reads a PDF page and outputs structured markdown that preserves headings, tables, math, and reading order. The benchmark uses targeted unit tests — "does this exact phrase end up under this exact heading?" — instead of fuzzy similarity scoring, which makes results reliable.

For each PDF, a battery of unit tests checks specific properties of the markdown output: cell values in tables, equation contents, paragraph order, and so on. The total score is the fraction of unit tests passed across the dataset.

Dataset size

1,403 PDF files and 7,010 unit tests covering tables, equations, handwriting, multi-column layouts, and scanned documents.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Tests a high-leverage real-world skill — turning unstructured documents into usable data.
Unit-test scoring avoids the noise of similarity metrics.
Covers the hard cases: math, tables, handwriting, multi-column.

Where It Falls Short

English-only documents.
Does not test whether the model reads the page correctly when the question depends on visual layout (e.g., financial statements with footnotes).
Different OCR pipelines can be combined with the same base model, which affects scores.

Frequently Asked Questions

How does olmOCR compare to commercial OCR APIs?

The strongest vision-language models match or beat dedicated commercial OCR on clean documents, especially when tables and math are involved. They lag on bulk batch throughput, which is where commercial services still win.

Is olmOCR useful for non-PDF document tasks?

Yes — a high score is a strong signal for any "screenshot to structured data" workflow, including web screenshots, scanned forms, and slide decks.