IBM

Granite 4.0 1B Speech

IBM's compact 2B-parameter speech-language model for multilingual ASR and bidirectional speech translation, ranked #1 on the OpenASR multilingual leaderboard (5.52 average WER) while running efficiently on edge devices.

2B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters2B

ArchitectureDense

ProviderIBM

Download Size6.4 GB

Community

Monthly Downloads94.3K

Likes231

Last Updated22 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.5%

Overall Score

59.7BB

Benchmark40%

89.0

Popularity25%

48.0

Efficiency25%

24.4

Versatility10%

60.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	1.7 GB
AMD Instinct MI300XAMD	SS	1.7 GB
AMD Instinct MI325XAMD	SS	1.7 GB
AMD Instinct MI355XAMD	SS	1.7 GB
AMD Radeon RX 7600 8GBAMD	SS	1.7 GB
AMD Radeon RX 7700 XTAMD	SS	1.7 GB
AMD Radeon RX 7800 XTAMD	SS	1.7 GB
AMD Radeon RX 7900 XTAMD	SS	1.7 GB
AMD Radeon RX 7900 XTXAMD	SS	1.7 GB
AMD Radeon RX 9070AMD	SS	1.7 GB
AMD Radeon RX 9070 XTAMD	SS	1.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.7 GB
Apple M4Apple	SS	1.7 GB
Apple M4 Max (40-core GPU)Apple	SS	1.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.7 GB
Apple M5Apple	SS	1.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.7 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	1.7 GB

Rows per page

Page 1 of 4

About This Model

Granite 4.0 1B Speech

Architecture: Specialized acoustic encoder coupled with the Granite 4.0 1B language backbone:

Conformer acoustic encoder — 16 Conformer blocks trained with CTC over an ASCII vocabulary for European languages plus Katakana phonetics for Japanese, with 4-second block-attention windows and self-conditioned CTC.
Windowed Q-Former modality adapter that downsamples acoustic embeddings 10× and projects them into the Granite 4.0 text-embedding space.
Granite 4.0 1B Base (hybrid Mamba-2 / Transformer architecture, 128k context) finetuned jointly on the speech corpora.

Modality: Audio-to-text with text-only fallback (the Granite 4.0 backbone). Supports speculative decoding for faster inference.

Training: Trained on IBM's Blue Vela cluster for 30 days (26 days encoder + 4 days projector) on 8 H100 GPUs, using public ASR/AST corpora and synthetic data targeted at Japanese ASR, keyword-biased ASR, and speech translation.

Capabilities & use cases:

Multilingual ASR in English, French, German, Spanish, Portuguese, Japanese.
Bidirectional AST between those languages and English, plus En→It and En→Zh.
New keyword list biasing for proper-noun, acronym and domain-vocabulary recognition.
Designed for edge deployments (laptops, on-prem, Apple Silicon via mlx-audio); RTFx ≈ 280× real-time. Half the parameters of granite-speech-3.3-2b with higher English transcription accuracy.

Related Models

IBM

Granite Speech 3.3 8B

9BDense

IBM

Granite Speech 3.3 2B

3BDense

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

IBM

Granite 4.0 1B Speech

2B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters2B

ArchitectureDense

ProviderIBM

Download Size6.4 GB

Community

Monthly Downloads94.3K

Likes231

Last Updated22 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.5%

Overall Score

59.7BB

Benchmark40%

89.0

Popularity25%

48.0

Efficiency25%

24.4

Versatility10%

60.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	1.7 GB
AMD Instinct MI300XAMD	SS	1.7 GB
AMD Instinct MI325XAMD	SS	1.7 GB
AMD Instinct MI355XAMD	SS	1.7 GB
AMD Radeon RX 7600 8GBAMD	SS	1.7 GB
AMD Radeon RX 7700 XTAMD	SS	1.7 GB
AMD Radeon RX 7800 XTAMD	SS	1.7 GB
AMD Radeon RX 7900 XTAMD	SS	1.7 GB
AMD Radeon RX 7900 XTXAMD	SS	1.7 GB
AMD Radeon RX 9070AMD	SS	1.7 GB
AMD Radeon RX 9070 XTAMD	SS	1.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.7 GB
Apple M4Apple	SS	1.7 GB
Apple M4 Max (40-core GPU)Apple	SS	1.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.7 GB
Apple M5Apple	SS	1.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.7 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	1.7 GB

Rows per page

Page 1 of 4

About This Model

Granite 4.0 1B Speech

Architecture: Specialized acoustic encoder coupled with the Granite 4.0 1B language backbone:

Conformer acoustic encoder — 16 Conformer blocks trained with CTC over an ASCII vocabulary for European languages plus Katakana phonetics for Japanese, with 4-second block-attention windows and self-conditioned CTC.
Windowed Q-Former modality adapter that downsamples acoustic embeddings 10× and projects them into the Granite 4.0 text-embedding space.
Granite 4.0 1B Base (hybrid Mamba-2 / Transformer architecture, 128k context) finetuned jointly on the speech corpora.

Modality: Audio-to-text with text-only fallback (the Granite 4.0 backbone). Supports speculative decoding for faster inference.

Capabilities & use cases:

Multilingual ASR in English, French, German, Spanish, Portuguese, Japanese.
Bidirectional AST between those languages and English, plus En→It and En→Zh.
New keyword list biasing for proper-noun, acronym and domain-vocabulary recognition.
Designed for edge deployments (laptops, on-prem, Apple Silicon via mlx-audio); RTFx ≈ 280× real-time. Half the parameters of granite-speech-3.3-2b with higher English transcription accuracy.

Related Models

IBM

Granite Speech 3.3 8B

9BDense

IBM

Granite Speech 3.3 2B

3BDense

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.