NVIDIA Canary 1B Flash is a faster 883M-parameter multilingual encoder-decoder ASR and translation model supporting 4 languages, with >1000 RTFx inference speed.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Canary-1B-Flash is part of the NVIDIA NeMo Canary Flash family — a faster and more accurate variant of Canary-1B. With 883 million parameters and inference speed of more than 1000 RTFx on open-asr-leaderboard datasets, it supports ASR in 4 languages (English, German, French, Spanish) and bidirectional translation between English and those languages, with optional punctuation and capitalization (PnC). It also offers experimental word-level and segment-level timestamps.
Architecture: Encoder-decoder model with a FastConformer encoder (32 layers) and a Transformer decoder (4 layers), totaling 883M parameters. Task tokens like <target language>, <task>, <toggle timestamps>, <toggle PnC> prompt the decoder. Uses a concatenated SentencePiece tokenizer.
Training: Trained using the NVIDIA NeMo Framework for 200K steps with 2D bucketing and OOMptimizer on 128 NVIDIA A100 80GB GPUs.
Use cases: High-throughput transcription, real-time translation, subtitle/caption generation, and timestamped transcription workflows. Released under the permissive CC-BY-4.0 license for commercial use.