Head-to-head ranking for models that turn a screenshot or mockup into a working web app.
Image-to-WebDev tests one of the most-requested AI-coding workflows: paste a screenshot of a UI, get a working clone. The model receives an input image plus an optional natural-language hint, then produces a runnable web app. Voters compare two anonymous reproductions of the same source image and pick the one that looks and behaves closer to the original. The benchmark stresses three things at once: image understanding, code generation, and visual taste.
Each comparison is anonymous. The model is given the reference image and produces code; the generated app is rendered in a sandbox and shown side-by-side with another model's attempt. Bradley-Terry on pairwise wins produces an Elo-style rating, which we normalize to 0–100.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
WebDev Arena starts from a text prompt. Image-to-WebDev starts from a reference image. The skills overlap, but vision-capable models with strong layout reasoning have a much bigger edge on Image-to-WebDev.
Less directly. But it is a strong predictor of how well a model can interpret design feedback in image form — Figma frames, whiteboard photos, hand-drawn sketches — which is the same skill in a different package.
Based on score correlations across our database.