Alibaba Qwen

Qwen3-ASR-0.6B

Alibaba Qwen's compact 0.6B-parameter all-in-one multilingual ASR model supporting 52 languages and dialects, built on the Qwen3-Omni audio foundation model. Optimized for ultra-low latency (~92ms TTFT) and on-device deployment.

0.6B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters0.6B

ArchitectureDense

ProviderAlibaba Qwen

Download Size1.9 GB

Community

Monthly Downloads440.4K

Likes279

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

6.4%

Overall Score

83.1AA

Benchmark40%

87.2

Popularity25%

76.0

Efficiency25%

88.9

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	0.9 GB
AMD Instinct MI300XAMD	SS	0.9 GB
AMD Instinct MI325XAMD	SS	0.9 GB
AMD Instinct MI355XAMD	SS	0.9 GB
AMD Radeon RX 7600 8GBAMD	SS	0.9 GB
AMD Radeon RX 7700 XTAMD	SS	0.9 GB
AMD Radeon RX 7800 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTXAMD	SS	0.9 GB
AMD Radeon RX 9070AMD	SS	0.9 GB
AMD Radeon RX 9070 XTAMD	SS	0.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.9 GB
Apple M4Apple	SS	0.9 GB
Apple M4 Max (40-core GPU)Apple	SS	0.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple M5Apple	SS	0.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.9 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	0.9 GB

Rows per page

Page 1 of 4

About This Model

Overview

Qwen3-ASR-0.6B is a lightweight automatic speech recognition model from Alibaba's Qwen team, released alongside the larger 1.7B variant. It is post-trained from the Qwen3-Omni audio-language foundation model and adopts a Large Audio-Language Model (LALM) paradigm: an audio encoder (AuT) produces acoustic features, a projector maps them into text-embedding space, and a Qwen3-based transformer decoder autoregressively emits transcriptions.

Capabilities

All-in-one multilingual ASR: 30 languages + 22 Chinese dialects + multiple English accents (52 language/dialect IDs in total).
Joint language identification (LID) and transcription in a single model.
Robust to noise, singing/song, and complex text patterns.
Extremely fast: time-to-first-token as low as ~92 ms; transcribes 2,000 seconds of audio in 1 second at concurrency 128.
Unified streaming & offline inference in one model; supports long audio.

Training & Modality

Based on Qwen3-Omni, post-trained on large-scale speech-text pairs (exact dataset mix undisclosed). Audio input is resampled to 16 kHz mel features.

Use Cases

Real-time dictation, edge/on-device transcription, subtitling, voice agents, call-center analytics, and pipelines needing a strong accuracy/efficiency trade-off.

Related Models

Alibaba Qwen

Qwen3-ASR-1.7B

1.7BDense

1.7B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

0.6B

Alibaba Qwen

Qwen3-ASR-0.6B

0.6B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters0.6B

ArchitectureDense

ProviderAlibaba Qwen

Download Size1.9 GB

Community

Monthly Downloads440.4K

Likes279

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

6.4%

Overall Score

83.1AA

Benchmark40%

87.2

Popularity25%

76.0

Efficiency25%

88.9

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	0.9 GB
AMD Instinct MI300XAMD	SS	0.9 GB
AMD Instinct MI325XAMD	SS	0.9 GB
AMD Instinct MI355XAMD	SS	0.9 GB
AMD Radeon RX 7600 8GBAMD	SS	0.9 GB
AMD Radeon RX 7700 XTAMD	SS	0.9 GB
AMD Radeon RX 7800 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTXAMD	SS	0.9 GB
AMD Radeon RX 9070AMD	SS	0.9 GB
AMD Radeon RX 9070 XTAMD	SS	0.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.9 GB
Apple M4Apple	SS	0.9 GB
Apple M4 Max (40-core GPU)Apple	SS	0.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple M5Apple	SS	0.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.9 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	0.9 GB

Rows per page

Page 1 of 4

About This Model

Overview

Capabilities

All-in-one multilingual ASR: 30 languages + 22 Chinese dialects + multiple English accents (52 language/dialect IDs in total).
Joint language identification (LID) and transcription in a single model.
Robust to noise, singing/song, and complex text patterns.
Extremely fast: time-to-first-token as low as ~92 ms; transcribes 2,000 seconds of audio in 1 second at concurrency 128.
Unified streaming & offline inference in one model; supports long audio.

Training & Modality

Based on Qwen3-Omni, post-trained on large-scale speech-text pairs (exact dataset mix undisclosed). Audio input is resampled to 16 kHz mel features.

Use Cases

Real-time dictation, edge/on-device transcription, subtitling, voice agents, call-center analytics, and pipelines needing a strong accuracy/efficiency trade-off.

Related Models

Alibaba Qwen

Qwen3-ASR-1.7B

1.7BDense

1.7B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.