Head-to-head human preference ranking for text-to-image and image-edit models, run by Arena.ai.
Image Arena is the image-generation companion to the Arena.ai chat leaderboard. A user types a prompt, sees two anonymous images, and picks which one they prefer. Bradley-Terry on pairwise wins produces an Elo-style ranking that rewards real-world taste rather than narrow benchmark scores. Arena.ai now runs two separate image boards: text-to-image at arena.ai/leaderboard/text-to-image and image-edit at arena.ai/leaderboard/image-edit. We report the text-to-image rating here.
Voters do not see which model produced which image. Wins, losses, and ties on every pairwise comparison feed into a single rating per model. We normalize the published rating to a 0–100 scale on this page for consistency with the other modalities. The text-to-image board scores prompt-to-image generation; the image-edit board scores conditional edits where the model gets an input image plus an instruction.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
Image Arena measures what users prefer; GenEval measures whether the image faithfully follows the prompt. Strong prompt-following can lose to weaker fidelity if the second model is more aesthetically pleasing.
Less so. The voter base is mostly English-speaking, and stylistic preferences vary by culture. For non-English use cases, weight GenEval and HPS v2 more heavily.
Based on score correlations across our database.