Broad-domain knowledge benchmark that tests recall across business, science, history, and culture.
Omniscience tests breadth of knowledge across business, science, history, culture, and everyday facts. It is designed to complement GPQA and HLE, which focus on narrow expert reasoning, by measuring whether a model has the kind of broad knowledge a generalist user expects.
Short-answer or multiple-choice questions across many domains. Scoring is percent correct, with per-domain breakdowns available for diagnostics.
No scores yet for this benchmark.
Not enough scored models yet.
Not enough scored models yet.
Based on score correlations across our database.