Benchmarks · 2023

HPS v2: Human Preference Score v2

Name: HPS v2: Human Preference Score v2
Creator: Tsinghua University and Tencent
Published: 2023
Keywords: HPS v2, AI benchmark, image model evaluation, Tsinghua University and Tencent

A reward model that predicts what humans will prefer, trained on hundreds of thousands of real preference labels.

Open Dataset Read Paper

Models Tested

Top Score

—

Published

2023

Source

Tsinghua University and Tencent

How It Works

HPS v2 is a learned reward model: given a prompt and an image, it returns a score for how likely a human would prefer that image. It is used in two ways — as an offline benchmark for image generators, and as a training signal for reinforcement learning from human feedback.

The reward model was trained on hundreds of thousands of pairwise human preference labels across styles ranging from anime to photo to concept art. The benchmark score is the average HPS v2 reward across a standard set of prompts.

Dataset size

A learned scorer trained on 798,090 human preference labels across multiple image styles and topics.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Strongly correlated with human preference across styles, including artistic and photo.
Cheaper than running real human voting at scale.
Useful as both an evaluation metric and a training signal.

Where It Falls Short

A learned proxy — high HPS v2 does not always translate to wins in live Arena voting.
Style bias: models tuned on the training distribution can game the score.
Does not measure compositional accuracy — pair with GenEval for that.

Frequently Asked Questions

Can HPS v2 be gamed?

Yes, like any learned reward model. Models that are explicitly trained against HPS v2 can win the benchmark while feeling less natural to humans. Always sanity-check with at least one fixed test like GenEval.