Head-to-head ranking for models that read PDFs, slides, and long screenshots to answer real questions.
Document Arena scores how well models read and reason over real documents. A user uploads a PDF, slide deck, or long screenshot and asks a question that needs to be answered from the contents. Two anonymous models answer; voters pick the better one. The board rewards layout-aware reading: pulling the right table cell, the right footnote, the right page out of a long document.
Each comparison is anonymous. Both models receive the same document and question, then produce answers. Wins and losses feed a Bradley-Terry rating that we normalize to 0–100.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
olmOCR scores literal document-to-markdown conversion on a fixed set of PDFs with unit-test grading. Document Arena scores open-ended question answering over user-supplied documents with human voting. olmOCR is sharp and reproducible; Document Arena is broad and reflective of real use.
If your workflow is PDFs, contracts, financial reports, or slide decks: Document Arena. If it is photos, diagrams, screenshots of UI, or open-ended visual Q&A: Vision Arena. They correlate but capture different real-world failure modes.
Based on score correlations across our database.