Head-to-head ranking for models that animate a still input image, with or without a text instruction.
Image-to-Video Arena scores motion conditioned on a still input image. The user provides a photo, optionally adds a description of how it should move, and two anonymous models each generate a clip. Voters pick the better animation. The benchmark rewards subject preservation, plausible motion, and faithful interpretation of the prompt, the skills that matter for "make this photo move" product features.
Each comparison is anonymous. Both models receive the same input image and the same motion instruction, then produce a short clip. Bradley-Terry on pairwise wins yields a single rating per model, normalized to 0–100.
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | Seedance 2.0 | ByteDance | Closed | 100.0 |
| 02 | Kling 2.5 Turbo | Kuaishou | Closed | 76.8 |
| 03 | Runway Gen-4.5 | Runway | Closed | 63.3 |
| 04 | Veo 3.1 | Closed | 62.0 | |
| 05 | Veo 3 | Closed | 50.6 | |
| 06 | LTX-2 19B | Lightricks | Open | 13.9 |
| 07 | HunyuanVideo-1.5 | Tencent | Open | 6.8 |
6 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.
Whenever you have a reference image you want to animate. Image-to-video is more controllable and tends to produce more consistent identities, but it is harder to swap subjects mid-clip.
Based on score correlations across our database.
| 08 | Wan2.2-T2V-A14B | Alibaba | Open | 0.0 |