NVIDIA Parakeet CTC 1.1B is an XXL FastConformer-CTC English ASR model jointly developed by NVIDIA NeMo and Suno.ai, offering strong non-autoregressive speech recognition accuracy with efficient inference.
A solid 1.1B-parameter dense audio model from NVIDIA. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Parakeet-CTC-1.1B is an ASR model that transcribes speech in lower-case English alphabet. Jointly developed by NVIDIA NeMo and Suno.ai, it is an XXL version of the FastConformer CTC architecture (~1.1B parameters).
Architecture: FastConformer encoder (an optimized Conformer with 8x depthwise-separable convolutional downsampling) with a linear CTC decoder. Being non-autoregressive, CTC inference is very efficient. Can also be run natively via 🤗 Transformers (ParakeetForCTC).
Training: Trained using the NVIDIA NeMo toolkit on a large multi-domain English corpus including LibriSpeech, Fisher, Switchboard, WSJ, Common Voice, VCTK, VoxPopuli, Europarl, Multilingual LibriSpeech, and People's Speech, along with a large private corpus.
Use cases: Low-latency English ASR, transcription of long audio, voice interfaces, and fine-tuning base for domain-specific ASR. Accepts 16 kHz mono-channel audio (WAV) as input.