Text-to-speech and ASR models ranked by TTS Arena preference, WER, and MOS.
Benchmark data last synced Apr 25, 2026
| Kind | Compare | |||||
|---|---|---|---|---|---|---|
SS85.6 | asr | 87 | 0.6B | 0.9 GB | ||
SS85.3 | asr | 88 | 0.6B | 0.9 GB | ||
AA83.1 | Alibaba Qwen | asr | 87 | 0.6B | 0.9 GB | |
AA75.3 | NVIDIA | asr | 87 | 0.883B | 1.0 GB | |
AA74.6 | NVIDIA | asr | 86 | 0.182B | 0.6 GB | |
AA74.4 | Useful Sensors | asr | 87 | 0.245B | 0.7 GB | |
AA71.9 | NVIDIA | asr | 86 | 0.978B | 1.1 GB | |
AA71.8 | Alibaba Qwen | asr | 88 | 1.7B | 1.5 GB | |
BB69.7 | asr | 89 | 2B | 1.7 GB | ||
BB68.9 | Hugging Face | asr | 86 | 0.8B | 1.0 GB | |
BB67.7 | NVIDIA | asr | 85 | 1.1B | 1.2 GB | |
BB67.7 | NVIDIA | asr | 87 | 1B | 1.1 GB | |
BB65.3 | Mistral AI | asr | 86 | 3B | 2.3 GB | |
BB64.9 | asr | 86 | 1.5B | 1.4 GB | ||
BB64.9 | Microsoft | asr | 88 | 5.6B | 3.9 GB |