NVIDIA Parakeet RNNT 1.1B is an XXL FastConformer RNN-Transducer English ASR model jointly developed by NVIDIA NeMo and Suno.ai, offering strong accuracy and streaming-capable inference.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Parakeet-RNNT-1.1B is an ASR model that transcribes speech in lower-case English alphabet. Jointly developed by NVIDIA NeMo and Suno.ai, it is an XXL version of the FastConformer Transducer (~1.1B parameters). At release in early 2024, it (along with Parakeet CTC) topped the Hugging Face Open ASR Leaderboard, surpassing Whisper.
Architecture: FastConformer encoder (an optimized Conformer with 8x depthwise-separable convolutional downsampling) with an RNN-Transducer (RNNT) decoder trained with transducer loss in a multitask setup. Supports streaming inference.
Training: Trained using the NVIDIA NeMo toolkit for several hundred epochs on a large multi-domain English corpus (LibriSpeech, Fisher, Switchboard, WSJ-0/1, Common Voice 8.0, National Singapore Corpus 1 & 6, VCTK, VoxPopuli, Europarl, Multilingual LibriSpeech, People's Speech) plus proprietary data.
Use cases: Streaming English ASR, voice assistants, call-center transcription, captioning, and as a base for fine-tuning. Accepts 16 kHz mono-channel audio (WAV) as input.