Benchmarks · 2023

ImageReward: ImageReward Generation Quality Score

Name: ImageReward: ImageReward Generation Quality Score
Creator: Tsinghua University
Published: 2023
Keywords: ImageReward, AI benchmark, image model evaluation, Tsinghua University

A reward model that judges text-image alignment, fidelity, and aesthetic quality on a single combined score.

Open Dataset Read Paper

Models Tested

Top Score

—

Published

2023

Source

Tsinghua University

How It Works

ImageReward bundles three signals — does the image match the prompt, is it visually faithful, and is it aesthetically pleasing — into a single reward score. The model is also widely used to fine-tune diffusion generators via reward-weighted training.

The reward model was trained on hundreds of thousands of human preferences plus structured ratings on alignment, fidelity, and aesthetic. Benchmark score is the mean reward across a standard prompt set.

Dataset size

Trained on 137,000 human preferences over 8,878 prompts.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Strongly predictive of human satisfaction on prompts that mix narrative with visual quality.
Combines multiple quality axes into a single comparable number.
Open and easy to integrate into training and evaluation loops.

Where It Falls Short

Like HPS v2, gameable through reward-model overfitting.
Underweights niche compositional failures that humans notice only on close inspection.
English-trained, so non-English prompts get noisier scores.

Frequently Asked Questions

When should I use ImageReward vs HPS v2?

They correlate strongly. Use ImageReward if you care about a balance of prompt alignment and aesthetics; use HPS v2 if you want the closest single proxy to live human voting. Most benchmark reports use both as a sanity check.