IBM's flagship 8B-parameter speech-language model for high-accuracy ASR and speech translation, modality-aligning Granite 3.3 8B Instruct with a conformer encoder for state-of-the-art English transcription among open models.
A workable 9B-parameter dense audio model from IBM. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Architecture: Two-pass speech-language model:
Modality: Audio in / text out. Supports a pure-text fallback that calls the underlying Granite LLM directly, preserving full text capabilities and safety alignment.
Training: Modality-aligned from granite-3.3-8b-instruct on public ASR/AST corpora plus synthetic translation data. This particular checkpoint was trained for 13 days on 32 H100 GPUs on IBM's Blue Vela supercomputer. Revision 3.3.2 introduced multilingual support for English, French, German, Spanish and Portuguese.
Use cases: Enterprise-grade English ASR (outperforms several competitors trained on far more proprietary data), English↔X speech translation for major European languages; suited for downstream Q&A and summarization via the Granite LLM.