NVIDIA Parakeet TDT 1.1B is an XXL FastConformer Token-and-Duration Transducer English ASR model, offering higher accuracy and 64% greater speed than the comparable Parakeet RNNT 1.1B.
A solid 1.1B-parameter dense audio model from NVIDIA. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Parakeet-TDT-1.1B is an ASR model that transcribes speech in lower-case English alphabet. Jointly developed by NVIDIA NeMo and Suno.ai, it is an XXL version of the FastConformer model (~1.1B parameters) equipped with a novel Token-and-Duration Transducer (TDT) decoder. It topped the Hugging Face Open ASR Leaderboard in early 2024, outperforming the similarly sized Parakeet-RNNT-1.1B in accuracy while running 64% faster.
Architecture: FastConformer encoder with a Token-and-Duration Transducer (TDT) decoder that jointly predicts both the token and its duration, allowing the model to skip blank frames during recognition and reduce wasted computation.
Training: Trained using the NVIDIA NeMo toolkit on 64K hours of English speech including 40K hours of private data and 24K hours from public corpora.
Use cases: High-throughput English ASR, transcription services, voice analytics, real-time captioning, and fine-tuning base. Accepts 16 kHz mono-channel audio (WAV) as input.