CosyVoice is Alibaba’s open-weight text-to-speech family with multilingual support and zero-shot voice cloning, popular for self-hosted streaming TTS.
See all models from AlibabaModels in family
1
Open weight
1
API only
0
Avg score
41.7
Top benchmark
5.0
MOS
Total HF downloads
2.9K
Primary modality
Audio
First release
Dec 2024
Latest release
Dec 2024
Every release in the CosyVoice family, ranked by composite score across benchmarks, popularity, efficiency, and versatility.
| # | Model | Modality | Score | Params | Released |
|---|---|---|---|---|---|
| 1 | audio | CC41.7 | — | Dec 2024 |
When each release shipped, newest first. Useful for tracking version cadence.
Dec 13
Composite grades across this family. Higher is better, blending benchmarks, popularity, and efficiency.
Models with downloadable weights, ranked by composite score.
| # | Model | Modality | Score | Params | Released |
|---|---|---|---|---|---|
| 1 | audio | CC41.7 | — | Dec 2024 |
Spin up an instance in the cloud, or pick local hardware that fits.
Full Directory
Open the full directory to filter by hardware, capability, license, and benchmark score.
Or Browse by Provider
See every model from a lab side by side, with aggregate stats. Useful when you want a cross-family view of one provider.