IBM

Granite Speech 3.3 8B

IBM's flagship 8B-parameter speech-language model for high-accuracy ASR and speech translation, modality-aligning Granite 3.3 8B Instruct with a conformer encoder for state-of-the-art English transcription among open models.

9B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters9B

ArchitectureDense

Training Cutoff2024-04

ProviderIBM

Download Size20.1 GB

Community

Monthly Downloads82.4K

Likes166

Last Updated22 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.7%

Overall Score

53.1CC

Benchmark40%

88.5

Popularity25%

44.7

Efficiency25%

2.2

Versatility10%

60.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	6.0 GB
AMD Instinct MI300XAMD	SS	6.0 GB
AMD Instinct MI325XAMD	SS	6.0 GB
AMD Instinct MI355XAMD	SS	6.0 GB
AMD Radeon RX 7700 XTAMD	SS	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	6.0 GB
AMD Radeon RX 9070AMD	SS	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	6.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.0 GB
Apple M4Apple	SS	6.0 GB
Apple M4 Max (40-core GPU)Apple	SS	6.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple M5Apple	SS	6.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	6.0 GB

Rows per page

Page 1 of 4

About This Model

Granite Speech 3.3 8B

Architecture: Two-pass speech-language model:

Conformer acoustic encoder with block attention, self-conditioning and CTC training.
Windowed Q-Former speech-text adapter with 2 layers and 3 trainable queries per block (10× temporal downsampling).
Granite 3.3 8B Instruct LLM (128k context) as the language backbone.
LoRA adapters (rank 64) on the LLM's query/value projections, activated only in speech mode.

Modality: Audio in / text out. Supports a pure-text fallback that calls the underlying Granite LLM directly, preserving full text capabilities and safety alignment.

Training: Modality-aligned from granite-3.3-8b-instruct on public ASR/AST corpora plus synthetic translation data. This particular checkpoint was trained for 13 days on 32 H100 GPUs on IBM's Blue Vela supercomputer. Revision 3.3.2 introduced multilingual support for English, French, German, Spanish and Portuguese.

Use cases: Enterprise-grade English ASR (outperforms several competitors trained on far more proprietary data), English↔X speech translation for major European languages; suited for downstream Q&A and summarization via the Granite LLM.

Related Models

IBM

Granite Speech 3.3 2B

3BDense

IBM

Granite 4.0 1B Speech

2BDense

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

IBM

Granite Speech 3.3 8B

9B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters9B

ArchitectureDense

Training Cutoff2024-04

ProviderIBM

Download Size20.1 GB

Community

Monthly Downloads82.4K

Likes166

Last Updated22 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.7%

Overall Score

53.1CC

Benchmark40%

88.5

Popularity25%

44.7

Efficiency25%

2.2

Versatility10%

60.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	6.0 GB
AMD Instinct MI300XAMD	SS	6.0 GB
AMD Instinct MI325XAMD	SS	6.0 GB
AMD Instinct MI355XAMD	SS	6.0 GB
AMD Radeon RX 7700 XTAMD	SS	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	6.0 GB
AMD Radeon RX 9070AMD	SS	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	6.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.0 GB
Apple M4Apple	SS	6.0 GB
Apple M4 Max (40-core GPU)Apple	SS	6.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple M5Apple	SS	6.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	6.0 GB

Rows per page

Page 1 of 4

About This Model

Granite Speech 3.3 8B

Architecture: Two-pass speech-language model:

Conformer acoustic encoder with block attention, self-conditioning and CTC training.
Windowed Q-Former speech-text adapter with 2 layers and 3 trainable queries per block (10× temporal downsampling).
Granite 3.3 8B Instruct LLM (128k context) as the language backbone.
LoRA adapters (rank 64) on the LLM's query/value projections, activated only in speech mode.

Modality: Audio in / text out. Supports a pure-text fallback that calls the underlying Granite LLM directly, preserving full text capabilities and safety alignment.

Related Models

IBM

Granite Speech 3.3 2B

3BDense

IBM

Granite 4.0 1B Speech

2BDense

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.