The standard accuracy metric for speech-to-text: lower is better.
WER is the percent of words in a transcript that differ from the human reference — substitutions, deletions, and insertions all count as one error. It is the canonical accuracy metric for speech-to-text. Lower is better: a WER of 5% means 5 out of every 100 words are wrong.
For each audio clip, the ASR model produces a transcript, which is compared word-for-word to the reference. Errors are summed and divided by the reference length. The Open ASR Leaderboard reports WER averaged across a basket of public datasets to control for dataset-specific quirks.
| # | Model | Lab | Source | Score |
|---|---|---|---|---|
| 01 | Fish Speech v1.4 | Fish Audio | Open | 1.0% |
| 02 | Fish Speech v1.5 | Fish Audio | Open | 1.0% |
| 03 | Cohere Transcribe (03-2026) | Cohere | Open | 5.4% |
| 04 | Granite 4.0 1B Speech | IBM | Open | 5.5% |
| 05 | NVIDIA Canary-Qwen 2.5B | NVIDIA | Open | 5.6% |
| 06 | Granite Speech 3.3 8B | IBM | Open | 5.7% |
| 07 |
3 model(s) with undisclosed parameter counts not shown. Most closed-source labs do not publish model size.
Not enough scored models yet.
On clean read speech, top systems get under 2% WER. On real conversational audio with accents and background noise, even the best systems are between 5% and 12%. Anything above 15% in production is going to feel unreliable to users.
Yes, but watch for over-optimization on a single dataset. A model that hits very low WER on LibriSpeech but stays high on Common Voice may be brittle to accents. The Open ASR Leaderboard cross-averages to expose this.
Based on score correlations across our database.
| Qwen3-ASR-1.7B |
| Alibaba |
| Open |
| 5.8% |
| 08 | Granite Speech 3.3 2B | IBM | Open | 6.0% |
| 09 | Phi-4-multimodal-instruct | Microsoft | Open | 6.0% |
| 10 | NVIDIA Parakeet TDT 0.6B v2 | NVIDIA | Open | 6.0% |
| 11 | NVIDIA Parakeet TDT 0.6B v3 | NVIDIA | Open | 6.3% |
| 12 | NVIDIA Canary 1B Flash | NVIDIA | Open | 6.3% |
| 13 | Kyutai STT 2.6B EN | Kyutai | Open | 6.4% |
| 14 | Qwen3-ASR-0.6B | Alibaba | Open | 6.4% |
| 15 | NVIDIA Canary 1B | NVIDIA | Open | 6.5% |