Benchmarks · 2024

Video Arena: Arena.ai Video Leaderboard

Name: Video Arena: Arena.ai Video Leaderboard
Creator: Arena.ai (formerly LMSYS)
Published: 2024
Keywords: Video Arena, AI benchmark, video model evaluation, Arena.ai (formerly LMSYS)

Head-to-head human preference ranking for text-to-video, image-to-video, and video-edit models.

Open Dataset

Scores are min-max normalized. Arena.ai publishes raw Bradley-Terry / Elo ratings; we rescale them to a 0–100 axis across every scored model so they sit next to accuracy-style benchmarks. Rankings stay the same as on arena.ai.

Models Tested

Top Score

100.0

Published

2024

Source

Arena.ai (formerly LMSYS)

How It Works

Video Arena is the video-generation companion to the Arena.ai chat leaderboard. Users see two anonymous generated clips for the same prompt and vote on which they prefer. The pairwise wins drive a Bradley-Terry rating that reflects general taste rather than any single technical dimension. Arena.ai now runs three separate video boards: text-to-video at arena.ai/leaderboard/text-to-video, image-to-video at arena.ai/leaderboard/image-to-video, and video-edit at arena.ai/leaderboard/video-edit. We report the text-to-video rating on this page.

Voters do not see which model produced which clip. Aggregate wins feed a Bradley-Terry rating per model. We normalize the published rating to a 0–100 scale here so it can be read alongside VBench. The text-to-video board scores prompt-to-clip generation; image-to-video scores motion conditioned on an input image; video-edit scores edits to an input clip with a text instruction.

Dataset size

Tens of thousands of anonymous side-by-side video comparisons over real prompts, split into text-to-video, image-to-video, and video-edit boards.

Mean score

56.7

Median score

64.2

Open / Closed

6 / 5

Top Scorers

#	Model	Lab	Source	Score
01	Seedance 2.0	ByteDance	Closed	100.0
02	Runway Gen-4.5	Runway	Closed	82.1
03	Veo 3	Google	Closed	80.7
04	Kling 2.5 Turbo	Kuaishou	Closed	77.4
05	Veo 3.1	Google	Closed	76.6
06	Kandinsky 5.0 Video Pro	Kandinsky	Open	64.2
07	LTX-2 19B	Lightricks	Open	49.3
08	Wan2.2-T2V-A14B	Alibaba	Open	42.3
09	Kandinsky 5.0 Video Lite	Kandinsky	Open	42.0
10	HunyuanVideo-1.5	Tencent	Open	9.5
11	Mochi 1 Preview	Genmo AI	Open	0.0

Score Distribution

Open vs Closed Source

Gap on Video Arena:+35.8pts closed leads

Top Open-Source Models

1Kandinsky 5.0 Video Pro64.2
2LTX-2 19B49.3
3Wan2.2-T2V-A14B42.3

Top Closed-Source Models

1Seedance 2.0100
2Runway Gen-4.582.1
3Veo 380.7

Score vs Parameter Count

6 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.

Average Score by Lab

Google
78.7n = 2
Kandinsky
53.1n = 2

Most Correlated Benchmarks

Image-to-Video Arena
+0.89n = 8
Pearson r: −1 to +1. Positive means the two benchmarks rank models in similar order; negative means the opposite.

What It Captures Well

Holistic: captures motion quality, consistency, prompt fidelity, and aesthetic in one number.
Real prompts from real users.
Updated continuously as new generators release.

Where It Falls Short

New entrants get few votes initially; ratings can be noisy.
Preference is subjective and shifts with viewer expectations over time.
Hard to interpret which specific failure cost the model a vote.

Frequently Asked Questions

How is Video Arena different from VBench?

Video Arena measures general human preference. VBench breaks the task into 16 specific quality dimensions and reports each one. Use both together: VBench tells you where the weaknesses are; Arena tells you whether users care.

Related Benchmarks

Based on score correlations across our database.

Pearson r +0.89

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Benchmarks · 2024

Video Arena: Arena.ai Video Leaderboard

Head-to-head human preference ranking for text-to-video, image-to-video, and video-edit models.

Open Dataset

Models Tested

Top Score

100.0

Published

2024

Source

Arena.ai (formerly LMSYS)

How It Works

Dataset size

Tens of thousands of anonymous side-by-side video comparisons over real prompts, split into text-to-video, image-to-video, and video-edit boards.

Mean score

56.7

Median score

64.2

Open / Closed

6 / 5

Top Scorers

#	Model	Lab	Source	Score
01	Seedance 2.0	ByteDance	Closed	100.0
02	Runway Gen-4.5	Runway	Closed	82.1
03	Veo 3	Google	Closed	80.7
04	Kling 2.5 Turbo	Kuaishou	Closed	77.4
05	Veo 3.1	Google	Closed	76.6
06	Kandinsky 5.0 Video Pro	Kandinsky	Open	64.2
07	LTX-2 19B	Lightricks	Open	49.3
08	Wan2.2-T2V-A14B	Alibaba	Open	42.3
09	Kandinsky 5.0 Video Lite	Kandinsky	Open	42.0
10	HunyuanVideo-1.5	Tencent	Open	9.5
11	Mochi 1 Preview	Genmo AI	Open	0.0

Score Distribution

Open vs Closed Source

Gap on Video Arena:+35.8pts closed leads

Top Open-Source Models

1Kandinsky 5.0 Video Pro64.2
2LTX-2 19B49.3
3Wan2.2-T2V-A14B42.3

Top Closed-Source Models

1Seedance 2.0100
2Runway Gen-4.582.1
3Veo 380.7

Score vs Parameter Count

6 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.

Average Score by Lab

Google
78.7n = 2
Kandinsky
53.1n = 2

Most Correlated Benchmarks

Image-to-Video Arena
+0.89n = 8
Pearson r: −1 to +1. Positive means the two benchmarks rank models in similar order; negative means the opposite.

What It Captures Well

Holistic: captures motion quality, consistency, prompt fidelity, and aesthetic in one number.
Real prompts from real users.
Updated continuously as new generators release.

Where It Falls Short

New entrants get few votes initially; ratings can be noisy.
Preference is subjective and shifts with viewer expectations over time.
Hard to interpret which specific failure cost the model a vote.

Frequently Asked Questions

How is Video Arena different from VBench?

Related Benchmarks

Based on score correlations across our database.

Pearson r +0.89

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Video Arena: Arena.ai Video Leaderboard

How It Works

Top Scorers

Score Distribution

Open vs Closed Source

Score vs Parameter Count

Average Score by Lab

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Frequently Asked Questions

Related Benchmarks

Image-to-Video Arena

VBench

Video Edit Arena

The AI Build Report

Video Arena: Arena.ai Video Leaderboard

How It Works

Top Scorers

Score Distribution

Open vs Closed Source

Score vs Parameter Count

Average Score by Lab

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Frequently Asked Questions

Related Benchmarks

Image-to-Video Arena

VBench

Video Edit Arena

The AI Build Report