NVIDIA Parakeet CTC 1.1B is an XXL FastConformer-CTC English ASR model jointly developed by NVIDIA NeMo and Suno.ai, offering strong non-autoregressive speech recognition accuracy with efficient inference.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Parakeet-CTC-1.1B is an ASR model that transcribes speech in lower-case English alphabet. Jointly developed by NVIDIA NeMo and Suno.ai, it is an XXL version of the FastConformer CTC architecture (~1.1B parameters).
Architecture: FastConformer encoder (an optimized Conformer with 8x depthwise-separable convolutional downsampling) with a linear CTC decoder. Being non-autoregressive, CTC inference is very efficient. Can also be run natively via 🤗 Transformers (ParakeetForCTC).
Training: Trained using the NVIDIA NeMo toolkit on a large multi-domain English corpus including LibriSpeech, Fisher, Switchboard, WSJ, Common Voice, VCTK, VoxPopuli, Europarl, Multilingual LibriSpeech, and People's Speech, along with a large private corpus.
Use cases: Low-latency English ASR, transcription of long audio, voice interfaces, and fine-tuning base for domain-specific ASR. Accepts 16 kHz mono-channel audio (WAV) as input.