Z.ai

GLM-ASR-Nano-2512

Z.ai's compact 1.5B-parameter open-source ASR model from the GLM family, optimized for real-world conditions — including Chinese dialects (notably Cantonese) and whisper/quiet-speech — while outperforming Whisper V3 on several benchmarks.

1.5B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source asr workloads

A solid 1.5B-parameter dense audio model from Z.ai. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1.5B

ArchitectureDense

ProviderZ.ai

Download Size9.0 GB

Community

Monthly Downloads95.5K

Likes365

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

MITView Full License

Performance & Scoring

Benchmarks

WER

7.0%

Overall Score

65.5BB

Benchmark40%

85.9

Popularity25%

63.3

Efficiency25%

33.3

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	1.4 GB
Acer Veriton GN100 AI MiniAcer	SS	1.4 GB
AMD Instinct MI300XAMD	SS	1.4 GB
AMD Instinct MI325XAMD	SS	1.4 GB
AMD Instinct MI355XAMD	SS	1.4 GB
AMD Radeon RX 7600 8GBAMD	SS	1.4 GB
AMD Radeon RX 7700 XTAMD	SS	1.4 GB
AMD Radeon RX 7800 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTXAMD	SS	1.4 GB
AMD Radeon RX 9070AMD	SS	1.4 GB
AMD Radeon RX 9070 XTAMD	SS	1.4 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.4 GB
Apple M4Apple	SS	1.4 GB
Apple M4 Max (40-core GPU)Apple	SS	1.4 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple M5Apple	SS	1.4 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.4 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.4 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.4 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.4 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.4 GB

Rows per page

Page 1 of 5

About This Model

Overview

GLM-ASR-Nano-2512 is a robust open-source automatic speech recognition model from Z.ai (Zhipu AI), part of the GLM model family. The version string 2512 denotes the December 2025 release.

Architecture

Seq2Seq / encoder-decoder transformer exposed via the GlmAsrForConditionalGeneration class in 🤗 Transformers (requires transformers ≥ 5.0.0).
Chat-template interface: audio + text instruction prompt → transcription text (e.g. "Please transcribe this audio into text").
Served via transformers, vLLM, and SGLang (OpenAI-compatible /v1/audio/transcriptions endpoint).

Key Capabilities

Exceptional dialect support: strong on standard Mandarin and English, and notably optimized for Cantonese (粤语) and other Chinese regional dialects.
Low-volume / whisper-speech robustness: specifically trained to accurately transcribe very quiet audio that trips up traditional ASR systems.
SOTA among comparable open-source models on Chinese benchmarks — claimed lowest average error rate (~4.10) on suites including WenetSpeech Meeting and AISHELL-1; outperforms OpenAI Whisper V3 on multiple benchmarks.

Languages

Primary coverage of English and Chinese (Mandarin), with explicit dialect support including Cantonese.

Use Cases

Chinese meeting/interview transcription, Cantonese media transcription, noisy/far-field speech recognition, quiet-speech and whisper transcription, self-hosted on-premise enterprise ASR.

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

1.5B

Z.ai

GLM-ASR-Nano-2512

1.5B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source asr workloads

A solid 1.5B-parameter dense audio model from Z.ai. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1.5B

ArchitectureDense

ProviderZ.ai

Download Size9.0 GB

Community

Monthly Downloads95.5K

Likes365

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

MITView Full License

Performance & Scoring

Benchmarks

WER

7.0%

Overall Score

65.5BB

Benchmark40%

85.9

Popularity25%

63.3

Efficiency25%

33.3

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	1.4 GB
Acer Veriton GN100 AI MiniAcer	SS	1.4 GB
AMD Instinct MI300XAMD	SS	1.4 GB
AMD Instinct MI325XAMD	SS	1.4 GB
AMD Instinct MI355XAMD	SS	1.4 GB
AMD Radeon RX 7600 8GBAMD	SS	1.4 GB
AMD Radeon RX 7700 XTAMD	SS	1.4 GB
AMD Radeon RX 7800 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTXAMD	SS	1.4 GB
AMD Radeon RX 9070AMD	SS	1.4 GB
AMD Radeon RX 9070 XTAMD	SS	1.4 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.4 GB
Apple M4Apple	SS	1.4 GB
Apple M4 Max (40-core GPU)Apple	SS	1.4 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple M5Apple	SS	1.4 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.4 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.4 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.4 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.4 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.4 GB

Rows per page

Page 1 of 5

About This Model

Overview

GLM-ASR-Nano-2512 is a robust open-source automatic speech recognition model from Z.ai (Zhipu AI), part of the GLM model family. The version string 2512 denotes the December 2025 release.

Architecture

Seq2Seq / encoder-decoder transformer exposed via the GlmAsrForConditionalGeneration class in 🤗 Transformers (requires transformers ≥ 5.0.0).
Chat-template interface: audio + text instruction prompt → transcription text (e.g. "Please transcribe this audio into text").
Served via transformers, vLLM, and SGLang (OpenAI-compatible /v1/audio/transcriptions endpoint).

Key Capabilities

Exceptional dialect support: strong on standard Mandarin and English, and notably optimized for Cantonese (粤语) and other Chinese regional dialects.
Low-volume / whisper-speech robustness: specifically trained to accurately transcribe very quiet audio that trips up traditional ASR systems.
SOTA among comparable open-source models on Chinese benchmarks — claimed lowest average error rate (~4.10) on suites including WenetSpeech Meeting and AISHELL-1; outperforms OpenAI Whisper V3 on multiple benchmarks.

Languages

Primary coverage of English and Chinese (Mandarin), with explicit dialect support including Cantonese.

Use Cases

Chinese meeting/interview transcription, Cantonese media transcription, noisy/far-field speech recognition, quiet-speech and whisper transcription, self-hosted on-premise enterprise ASR.

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.