NVIDIA Canary 180M Flash is a compact 182M-parameter multilingual encoder-decoder ASR and translation model supporting 4 languages with >1200 RTFx inference speed, designed for mobile and edge deployment.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Canary-180M-Flash is the smallest member of the NVIDIA NeMo Canary Flash family, with 182 million parameters and inference speed of more than 1200 RTFx on open-asr-leaderboard datasets. It supports ASR in 4 languages (English, German, French, Spanish) and bidirectional translation between English and the other three languages, with optional punctuation and capitalization (PnC). It also offers experimental word-level and segment-level timestamps.
Architecture: Encoder-decoder with FastConformer encoder and Transformer decoder, based on the Canary Flash architecture. Uses a concatenated SentencePiece tokenizer.
Training: Trained using the NVIDIA NeMo framework for 219K steps with 2D bucketing and OOMptimizer on 32 NVIDIA A100 80GB GPUs.
Use cases: On-device speech recognition and translation (e.g., smartphones), real-time translation earbuds, low-latency voice assistants, and applications where privacy or offline use is required. Released under CC-BY-4.0 for commercial use.