Benchmarks · 2024

Image Edit Arena: Arena.ai Image-Edit Leaderboard

Name: Image Edit Arena: Arena.ai Image-Edit Leaderboard
Creator: Arena.ai
Published: 2024
Keywords: Image Edit Arena, AI benchmark, image model evaluation, Arena.ai

Head-to-head ranking for models that edit an input image given a text instruction.

Open Dataset

Scores are min-max normalized. Arena.ai publishes raw Bradley-Terry / Elo ratings; we rescale them to a 0–100 axis across every scored model so they sit next to accuracy-style benchmarks. Rankings stay the same as on arena.ai.

Models Tested

Top Score

—

Published

2024

Source

Arena.ai

How It Works

Image Edit Arena scores models on a different image task than text-to-image generation: a user provides an image plus a short instruction ("add a hat", "make it night", "remove the person on the left") and the model returns the edit. Voters compare two anonymous edits and pick the better one. The benchmark rewards faithful localization (changing only what was asked), preservation of the rest of the image, and instruction-following accuracy — skills that pure text-to-image models often lack.

Each comparison is anonymous. Both models receive the same input image and the same edit instruction, then produce an edited image. Bradley-Terry on pairwise wins yields a single rating, which we normalize to 0–100.

Dataset size

Anonymous A/B comparisons of edited images over real input photos and instructions.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Tests a workflow that pure text-to-image models cannot do.
Strong predictor of how a model will feel in product features like inpainting, retouching, and style transfer.
Live user prompts and images, so the test stays current.

Where It Falls Short

Requires the model to support image input — many open generators do not.
Preference voting can reward dramatic edits even when the instruction asked for a subtle one.
Score is noisy for very small models that get few votes.

Frequently Asked Questions

How does Image Edit Arena differ from Image Arena?

Image Arena (text-to-image) starts from a blank canvas — a prompt becomes an image. Image Edit Arena starts from an existing image — a prompt plus the source becomes an edited image. Strong text-to-image models often score poorly on edits and vice versa.

When should I pick a model by Image Edit Arena rating?

For any product that lets users edit photos with natural language: photo retouching, marketing-asset variations, conditional generation, or inpainting. For from-scratch generation, prioritize Image Arena and GenEval.