IBM's flagship 8B-parameter speech-language model for high-accuracy ASR and speech translation, modality-aligning Granite 3.3 8B Instruct with a conformer encoder for state-of-the-art English transcription among open models.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Architecture: Two-pass speech-language model:
Modality: Audio in / text out. Supports a pure-text fallback that calls the underlying Granite LLM directly, preserving full text capabilities and safety alignment.
Training: Modality-aligned from granite-3.3-8b-instruct on public ASR/AST corpora plus synthetic translation data. This particular checkpoint was trained for 13 days on 32 H100 GPUs on IBM's Blue Vela supercomputer. Revision 3.3.2 introduced multilingual support for English, French, German, Spanish and Portuguese.
Use cases: Enterprise-grade English ASR (outperforms several competitors trained on far more proprietary data), English↔X speech translation for major European languages; suited for downstream Q&A and summarization via the Granite LLM.