Benchmarks · 2024

AA Omniscience: Artificial Analysis Omniscience Benchmark

Name: AA Omniscience: Artificial Analysis Omniscience Benchmark
Creator: Artificial Analysis
Published: 2024
Keywords: AA Omniscience, AI benchmark, text model evaluation, Artificial Analysis

Broad-domain knowledge benchmark that tests recall across business, science, history, and culture.

Open Dataset

Models Tested

Top Score

—

Published

2024

Source

Artificial Analysis

How It Works

Omniscience tests breadth of knowledge across business, science, history, culture, and everyday facts. It is designed to complement GPQA and HLE, which focus on narrow expert reasoning, by measuring whether a model has the kind of broad knowledge a generalist user expects.

Short-answer or multiple-choice questions across many domains. Scoring is percent correct, with per-domain breakdowns available for diagnostics.

Dataset size

A curated knowledge benchmark covering diverse domains beyond traditional academic subjects.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Captures the kind of breadth users actually expect from a frontier assistant.
Complements narrow expert benchmarks like GPQA.
Run consistently by one team under one harness.

Where It Falls Short

Closed methodology compared to academic benchmarks.
Subject mix is not formally specified.
Less useful for narrow technical use cases.

Related Benchmarks

Based on score correlations across our database.

Pearson r —

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Benchmarks · 2024

AA Omniscience: Artificial Analysis Omniscience Benchmark

Broad-domain knowledge benchmark that tests recall across business, science, history, and culture.

Open Dataset

Models Tested

Top Score

—

Published

2024

Source

Artificial Analysis

How It Works

Short-answer or multiple-choice questions across many domains. Scoring is percent correct, with per-domain breakdowns available for diagnostics.

Dataset size

A curated knowledge benchmark covering diverse domains beyond traditional academic subjects.

Mean score

0.0

Median score

0.0

Open / Closed

0 / 0

Top Scorers

No scores yet for this benchmark.

Score Distribution

Not enough scored models yet.

Most Correlated Benchmarks

Not enough scored models yet.

What It Captures Well

Captures the kind of breadth users actually expect from a frontier assistant.
Complements narrow expert benchmarks like GPQA.
Run consistently by one team under one harness.

Where It Falls Short

Closed methodology compared to academic benchmarks.
Subject mix is not formally specified.
Less useful for narrow technical use cases.

Related Benchmarks

Based on score correlations across our database.

Pearson r —

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

AA Omniscience: Artificial Analysis Omniscience Benchmark

How It Works

Top Scorers

Score Distribution

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Related Benchmarks

GPQA

MMLU-PRO

GSM8K

SWE-Verified

The AI Build Report

AA Omniscience: Artificial Analysis Omniscience Benchmark

How It Works

Top Scorers

Score Distribution

Most Correlated Benchmarks

What It Captures Well

Where It Falls Short

Related Benchmarks

GPQA

MMLU-PRO

GSM8K

SWE-Verified

The AI Build Report