NVIDIA Parakeet TDT 1.1B is an XXL FastConformer Token-and-Duration Transducer English ASR model, offering higher accuracy and 64% greater speed than the comparable Parakeet RNNT 1.1B.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Parakeet-TDT-1.1B is an ASR model that transcribes speech in lower-case English alphabet. Jointly developed by NVIDIA NeMo and Suno.ai, it is an XXL version of the FastConformer model (~1.1B parameters) equipped with a novel Token-and-Duration Transducer (TDT) decoder. It topped the Hugging Face Open ASR Leaderboard in early 2024, outperforming the similarly sized Parakeet-RNNT-1.1B in accuracy while running 64% faster.
Architecture: FastConformer encoder with a Token-and-Duration Transducer (TDT) decoder that jointly predicts both the token and its duration, allowing the model to skip blank frames during recognition and reduce wasted computation.
Training: Trained using the NVIDIA NeMo toolkit on 64K hours of English speech including 40K hours of private data and 24K hours from public corpora.
Use cases: High-throughput English ASR, transcription services, voice analytics, real-time captioning, and fine-tuning base. Accepts 16 kHz mono-channel audio (WAV) as input.