Z.ai's compact 1.5B-parameter open-source ASR model from the GLM family, optimized for real-world conditions — including Chinese dialects (notably Cantonese) and whisper/quiet-speech — while outperforming Whisper V3 on several benchmarks.
A solid 1.5B-parameter dense audio model from Z.ai. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
GLM-ASR-Nano-2512 is a robust open-source automatic speech recognition model from Z.ai (Zhipu AI), part of the GLM model family. The version string 2512 denotes the December 2025 release.
GlmAsrForConditionalGeneration class in 🤗 Transformers (requires transformers ≥ 5.0.0).transformers, vLLM, and SGLang (OpenAI-compatible /v1/audio/transcriptions endpoint).Primary coverage of English and Chinese (Mandarin), with explicit dialect support including Cantonese.
Chinese meeting/interview transcription, Cantonese media transcription, noisy/far-field speech recognition, quiet-speech and whisper transcription, self-hosted on-premise enterprise ASR.