Head-to-head human preference ranking for text-to-image and image-edit models, run by Arena.ai.
Image Arena is the image-generation companion to the Arena.ai chat leaderboard. A user types a prompt, sees two anonymous images, and picks which one they prefer. Bradley-Terry on pairwise wins produces an Elo-style ranking that rewards real-world taste rather than narrow benchmark scores. Arena.ai now runs two separate image boards: text-to-image at arena.ai/leaderboard/text-to-image and image-edit at arena.ai/leaderboard/image-edit. We report the text-to-image rating here.
Voters do not see which model produced which image. Wins, losses, and ties on every pairwise comparison feed into a single rating per model. We normalize the published rating to a 0–100 scale on this page for consistency with the other modalities. The text-to-image board scores prompt-to-image generation; the image-edit board scores conditional edits where the model gets an input image plus an instruction.
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | GPT Image 2 | OpenAI | Closed | 100.0 |
| 02 | GPT Image 1.5 | OpenAI | Closed | 83.4 |
| 03 | Gemini 3 Pro Image (Nano Banana Pro) | Closed | 72.6 | |
| 04 | FLUX.2 [pro] | Black Forest Labs | Closed | 65.2 |
| 05 | Gemini 2.5 Flash Image (Nano Banana) | Closed | 59.0 | |
| 06 | Qwen-Image-2512 | Alibaba | Open | 59.0 |
| 07 | FLUX.2 [dev] | Black Forest Labs | Open |
9 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.
Image Arena measures what users prefer; GenEval measures whether the image faithfully follows the prompt. Strong prompt-following can lose to weaker fidelity if the second model is more aesthetically pleasing.
Less so. The voter base is mostly English-speaking, and stylistic preferences vary by culture. For non-English use cases, weight GenEval and HPS v2 more heavily.
Based on score correlations across our database.
| 58.5 |
| 08 | gpt-image-1 | OpenAI | Closed | 53.0 |
| 09 | Hunyuan-Image 3.0 | Tencent | Open | 50.5 |
| 10 | FLUX.2 [klein] 9B | Black Forest Labs | Open | 50.5 |
| 11 | Hunyuan-Image 3.0 Instruct | Tencent | Open | 47.9 |
| 12 | Z-Image-Turbo | Alibaba | Open | 46.3 |
| 13 | Imagen 4 | Closed | 45.9 |
| 14 | Midjourney V7 | Midjourney | Closed | 38.3 |
| 15 | FLUX.2 [klein] 4B | Black Forest Labs | Open | 37.3 |