Head-to-head human preference ranking for text-to-video, image-to-video, and video-edit models.
Video Arena is the video-generation companion to the Arena.ai chat leaderboard. Users see two anonymous generated clips for the same prompt and vote on which they prefer. The pairwise wins drive a Bradley-Terry rating that reflects general taste rather than any single technical dimension. Arena.ai now runs three separate video boards: text-to-video at arena.ai/leaderboard/text-to-video, image-to-video at arena.ai/leaderboard/image-to-video, and video-edit at arena.ai/leaderboard/video-edit. We report the text-to-video rating on this page.
Voters do not see which model produced which clip. Aggregate wins feed a Bradley-Terry rating per model. We normalize the published rating to a 0–100 scale here so it can be read alongside VBench. The text-to-video board scores prompt-to-clip generation; image-to-video scores motion conditioned on an input image; video-edit scores edits to an input clip with a text instruction.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
Video Arena measures general human preference. VBench breaks the task into 16 specific quality dimensions and reports each one. Use both together: VBench tells you where the weaknesses are; Arena tells you whether users care.
Based on score correlations across our database.