Sixteen-dimension benchmark covering temporal coherence, subject consistency, motion quality, and prompt fidelity.
VBench breaks video generation into the dimensions that humans actually notice: subject consistency over time, background coherence, motion smoothness, dynamic degree, aesthetic quality, imaging quality, object class, multiple objects, human action, color, spatial relationship, scene, appearance style, temporal style, overall consistency, and prompt fidelity. Each dimension has its own scoring pipeline, and the overall score is a weighted average.
For each dimension, VBench uses a tailored scoring method — object detectors for class fidelity, motion estimators for smoothness, classifiers for style, and so on. Models are run on a fixed prompt set and scored per dimension. The headline number is a weighted aggregate; per-dimension scores are the more actionable read.
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | Wan2.2-T2V-A14B | Alibaba | Open | 86.2 |
| 02 | Mochi 1 Preview | Genmo AI | Open | 77.4 |
No models in this category.
Not enough scored models yet.
Top closed-source models in 2026 score 82–86% on the overall index. Strong open-weight models are between 75% and 80%. Below 70% the failure modes start to show up in casual viewing.
Based on score correlations across our database.