NVIDIA

NVIDIA Canary-Qwen 2.5B

NVIDIA Canary-Qwen 2.5B is a state-of-the-art hybrid Speech-Augmented Language Model (SALM) combining the Canary-1B-Flash encoder with a Qwen3-1.7B LLM decoder, achieving a record 5.63% WER on the Hugging Face Open ASR leaderboard.

2.5B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters2.5B

ArchitectureDense

Training Cutoff2025

ProviderNVIDIA

Download Size15.4 GB

Community

Monthly Downloads121.0K

Likes414

Last Updated3 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

CC-BY-4.0View Full License

Performance & Scoring

Benchmarks

WER

5.6%

Overall Score

64.3BB

Benchmark40%

88.7

Popularity25%

67.3

Efficiency25%

20.0

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	2.0 GB
AMD Instinct MI300XAMD	SS	2.0 GB
AMD Instinct MI325XAMD	SS	2.0 GB
AMD Instinct MI355XAMD	SS	2.0 GB
AMD Radeon RX 7600 8GBAMD	SS	2.0 GB
AMD Radeon RX 7700 XTAMD	SS	2.0 GB
AMD Radeon RX 7800 XTAMD	SS	2.0 GB
AMD Radeon RX 7900 XTAMD	SS	2.0 GB
AMD Radeon RX 7900 XTXAMD	SS	2.0 GB
AMD Radeon RX 9070AMD	SS	2.0 GB
AMD Radeon RX 9070 XTAMD	SS	2.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.0 GB
Apple M4Apple	SS	2.0 GB
Apple M4 Max (40-core GPU)Apple	SS	2.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.0 GB
Apple M5Apple	SS	2.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	2.0 GB

Rows per page

Page 1 of 4

About This Model

NVIDIA Canary-Qwen 2.5B

Canary-Qwen-2.5B is an English speech-augmented language model that topped the Hugging Face Open ASR leaderboard at release with a record 5.63% WER while running at 418 RTFx. It operates in two modes:

ASR mode: Pure speech-to-text transcription with punctuation and capitalization.
LLM mode: Retains the full capabilities of the underlying Qwen LLM to post-process the transcript (summarization, Q&A, etc.).

Architecture: A Speech-Augmented Language Model (SALM) combining two base models: the nvidia/canary-1b-flash FastConformer encoder and Qwen/Qwen3-1.7B LLM decoder, connected via a linear projection and LoRA adapters applied to the LLM. The encoder's output frame rate is 80ms (12.5 tokens per second). The tokenizer is inherited from Qwen3-1.7B. LLM parameters were frozen during training; only the speech encoder, projection, and LoRA parameters were trainable.

Training: Trained using the NVIDIA NeMo toolkit for 90K steps on 32 NVIDIA A100 80GB GPUs. Approximately 1.3B tokens, ~40M (speech, text) pairs across 26 datasets. Max input audio was 40 seconds per training sample with max sequence length of 1024 tokens.

Use cases: Meeting summarization, podcast/interview transcription and analysis, enterprise voice-to-text with downstream LLM reasoning, agentic speech systems, accessibility services. Released under CC-BY-4.0 for commercial use.

Related Models

NVIDIA

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

2.5B

NVIDIA

NVIDIA Canary-Qwen 2.5B

2.5B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters2.5B

ArchitectureDense

Training Cutoff2025

ProviderNVIDIA

Download Size15.4 GB

Community

Monthly Downloads121.0K

Likes414

Last Updated3 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

CC-BY-4.0View Full License

Performance & Scoring

Benchmarks

WER

5.6%

Overall Score

64.3BB

Benchmark40%

88.7

Popularity25%

67.3

Efficiency25%

20.0

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	2.0 GB
AMD Instinct MI300XAMD	SS	2.0 GB
AMD Instinct MI325XAMD	SS	2.0 GB
AMD Instinct MI355XAMD	SS	2.0 GB
AMD Radeon RX 7600 8GBAMD	SS	2.0 GB
AMD Radeon RX 7700 XTAMD	SS	2.0 GB
AMD Radeon RX 7800 XTAMD	SS	2.0 GB
AMD Radeon RX 7900 XTAMD	SS	2.0 GB
AMD Radeon RX 7900 XTXAMD	SS	2.0 GB
AMD Radeon RX 9070AMD	SS	2.0 GB
AMD Radeon RX 9070 XTAMD	SS	2.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.0 GB
Apple M4Apple	SS	2.0 GB
Apple M4 Max (40-core GPU)Apple	SS	2.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.0 GB
Apple M5Apple	SS	2.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	2.0 GB

Rows per page

Page 1 of 4

About This Model

NVIDIA Canary-Qwen 2.5B

ASR mode: Pure speech-to-text transcription with punctuation and capitalization.
LLM mode: Retains the full capabilities of the underlying Qwen LLM to post-process the transcript (summarization, Q&A, etc.).