Alibaba Qwen

Qwen3-ASR-1.7B

Alibaba Qwen's flagship 1.7B-parameter ASR model supporting 52 languages and dialects, achieving SOTA performance among open-source ASR models and competitive with top proprietary APIs.

1.7B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters1.7B

ArchitectureDense

ProviderAlibaba Qwen

Download Size4.7 GB

Community

Monthly Downloads1.8M

Likes763

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.8%

Overall Score

71.8AA

Benchmark40%

88.5

Popularity25%

88.7

Efficiency25%

28.9

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	1.5 GB
AMD Instinct MI300XAMD	SS	1.5 GB
AMD Instinct MI325XAMD	SS	1.5 GB
AMD Instinct MI355XAMD	SS	1.5 GB
AMD Radeon RX 7600 8GBAMD	SS	1.5 GB
AMD Radeon RX 7700 XTAMD	SS	1.5 GB
AMD Radeon RX 7800 XTAMD	SS	1.5 GB
AMD Radeon RX 7900 XTAMD	SS	1.5 GB
AMD Radeon RX 7900 XTXAMD	SS	1.5 GB
AMD Radeon RX 9070AMD	SS	1.5 GB
AMD Radeon RX 9070 XTAMD	SS	1.5 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.5 GB
Apple M4Apple	SS	1.5 GB
Apple M4 Max (40-core GPU)Apple	SS	1.5 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.5 GB
Apple M5Apple	SS	1.5 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.5 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.5 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.5 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.5 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.5 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.5 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.5 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.5 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	1.5 GB

Rows per page

Page 1 of 4

About This Model

Overview

Qwen3-ASR-1.7B is the flagship model in the Qwen3-ASR family from Alibaba Cloud's Qwen team. It is a Large Audio-Language Model (LALM) post-trained from the Qwen3-Omni foundation model, pairing an audio encoder with a Qwen3 transformer decoder via a learned projector.

Capabilities

52 languages/dialects: 30 major languages (Chinese, English, Cantonese, Arabic, German, French, Spanish, Portuguese, Indonesian, Italian, Korean, Russian, Thai, Vietnamese, Japanese, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Hungarian, Macedonian, Romanian) plus 22 Chinese dialects (e.g., Cantonese HK/GD, Wu, Minnan, Sichuan, Shanghai, etc.).
All-in-one: joint language identification + transcription.
Robust recognition in noisy/complex acoustic conditions, singing voice, accents, and songs with background music.
SOTA among open-source ASR models; competitive with GPT-4o-transcribe class commercial APIs.
Unified streaming & offline inference; long-audio support up to ~20 minutes per pass.
Companion Qwen3-ForcedAligner-0.6B provides multilingual word/sentence timestamp alignment.

Inference Toolkit

Ships with qwen-asr PyPI package, vLLM backend for batch/async/streaming serving, and Docker images.

Use Cases

Meeting transcription, podcast/video subtitling, multilingual voice agents, call-center QA, media monitoring, and research on large audio-language models.

Related Models

Alibaba Qwen

Qwen3-ASR-0.6B

0.6BDense

0.6B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

1.7B

Alibaba Qwen

Qwen3-ASR-1.7B

Alibaba Qwen's flagship 1.7B-parameter ASR model supporting 52 languages and dialects, achieving SOTA performance among open-source ASR models and competitive with top proprietary APIs.

1.7B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters1.7B

ArchitectureDense

ProviderAlibaba Qwen

Download Size4.7 GB

Community

Monthly Downloads1.8M

Likes763

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

5.8%

Overall Score

71.8AA

Benchmark40%

88.5

Popularity25%

88.7

Efficiency25%

28.9

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	1.5 GB
AMD Instinct MI300XAMD	SS	1.5 GB
AMD Instinct MI325XAMD	SS	1.5 GB
AMD Instinct MI355XAMD	SS	1.5 GB
AMD Radeon RX 7600 8GBAMD	SS	1.5 GB
AMD Radeon RX 7700 XTAMD	SS	1.5 GB
AMD Radeon RX 7800 XTAMD	SS	1.5 GB
AMD Radeon RX 7900 XTAMD	SS	1.5 GB
AMD Radeon RX 7900 XTXAMD	SS	1.5 GB
AMD Radeon RX 9070AMD	SS	1.5 GB
AMD Radeon RX 9070 XTAMD	SS	1.5 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.5 GB
Apple M4Apple	SS	1.5 GB
Apple M4 Max (40-core GPU)Apple	SS	1.5 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.5 GB
Apple M5Apple	SS	1.5 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.5 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.5 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.5 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.5 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.5 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.5 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.5 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.5 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	1.5 GB

Rows per page

Page 1 of 4

About This Model

Overview

Capabilities

52 languages/dialects: 30 major languages (Chinese, English, Cantonese, Arabic, German, French, Spanish, Portuguese, Indonesian, Italian, Korean, Russian, Thai, Vietnamese, Japanese, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Hungarian, Macedonian, Romanian) plus 22 Chinese dialects (e.g., Cantonese HK/GD, Wu, Minnan, Sichuan, Shanghai, etc.).
All-in-one: joint language identification + transcription.
Robust recognition in noisy/complex acoustic conditions, singing voice, accents, and songs with background music.
SOTA among open-source ASR models; competitive with GPT-4o-transcribe class commercial APIs.
Unified streaming & offline inference; long-audio support up to ~20 minutes per pass.
Companion Qwen3-ForcedAligner-0.6B provides multilingual word/sentence timestamp alignment.

Inference Toolkit

Ships with qwen-asr PyPI package, vLLM backend for batch/async/streaming serving, and Docker images.

Use Cases

Meeting transcription, podcast/video subtitling, multilingual voice agents, call-center QA, media monitoring, and research on large audio-language models.

Related Models

Alibaba Qwen

Qwen3-ASR-0.6B

0.6BDense

0.6B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.