Z.ai's compact 1.5B-parameter open-source ASR model from the GLM family, optimized for real-world conditions — including Chinese dialects (notably Cantonese) and whisper/quiet-speech — while outperforming Whisper V3 on several benchmarks.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
GLM-ASR-Nano-2512 is a robust open-source automatic speech recognition model from Z.ai (Zhipu AI), part of the GLM model family. The version string 2512 denotes the December 2025 release.
GlmAsrForConditionalGeneration class in 🤗 Transformers (requires transformers ≥ 5.0.0).transformers, vLLM, and SGLang (OpenAI-compatible /v1/audio/transcriptions endpoint).Primary coverage of English and Chinese (Mandarin), with explicit dialect support including Cantonese.
Chinese meeting/interview transcription, Cantonese media transcription, noisy/far-field speech recognition, quiet-speech and whisper transcription, self-hosted on-premise enterprise ASR.