
Cohere Labs' first open-source voice model: a 2B-parameter dedicated ASR transformer that took the #1 spot on the Hugging Face Open ASR Leaderboard (5.42 average WER) at release, with support for 14 enterprise languages.
A solid 2B-parameter dense audio model from Cohere. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
cohere-transcribe-03-2026 is Cohere Labs' first audio model, a dedicated 2B-parameter audio-in / text-out automatic speech recognition model trained from scratch (no Whisper distillation) with supervised cross-entropy.
English, German, French, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Arabic, Vietnamese, Chinese (Mandarin), Japanese, Korean. No automatic language detection — language must be specified explicitly.
Supported natively in transformers (CohereAsrForConditionalGeneration), vLLM (/v1/audio/transcriptions), Apple Silicon, browser, and mobile; 18 quantized variants on the Hub.