A reward model that predicts what humans will prefer, trained on hundreds of thousands of real preference labels.
HPS v2 is a learned reward model: given a prompt and an image, it returns a score for how likely a human would prefer that image. It is used in two ways — as an offline benchmark for image generators, and as a training signal for reinforcement learning from human feedback.
The reward model was trained on hundreds of thousands of pairwise human preference labels across styles ranging from anime to photo to concept art. The benchmark score is the average HPS v2 reward across a standard set of prompts.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
Yes, like any learned reward model. Models that are explicitly trained against HPS v2 can win the benchmark while feeling less natural to humans. Always sanity-check with at least one fixed test like GenEval.
Based on score correlations across our database.