
Alibaba Qwen's compact 0.6B-parameter all-in-one multilingual ASR model supporting 52 languages and dialects, built on the Qwen3-Omni audio foundation model. Optimized for ultra-low latency (~92ms TTFT) and on-device deployment.
A strong 0.6B-parameter dense audio model from Alibaba. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.