Fourteen hundred real PDFs that test whether a model can turn messy documents into clean, structured markdown.
olmOCR is the strongest public test of document-to-markdown ability. The model reads a PDF page and outputs structured markdown that preserves headings, tables, math, and reading order. The benchmark uses targeted unit tests — "does this exact phrase end up under this exact heading?" — instead of fuzzy similarity scoring, which makes results reliable.
For each PDF, a battery of unit tests checks specific properties of the markdown output: cell values in tables, equation contents, paragraph order, and so on. The total score is the fraction of unit tests passed across the dataset.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
The strongest vision-language models match or beat dedicated commercial OCR on clean documents, especially when tables and math are involved. They lag on bulk batch throughput, which is where commercial services still win.
Yes — a high score is a strong signal for any "screenshot to structured data" workflow, including web screenshots, scanned forms, and slide decks.
Based on score correlations across our database.