Alibaba

Wan2.2-T2V-A14B

27B total / 14B active MoE text-to-video model with SNR-governed dynamic expert routing. High-noise expert shapes macro composition; low-noise expert refines high-frequency detail.

27B paramsMoE

View on Hugging Face Official Page

Our Take

Best for: Open-source text to video workloads

A workable 27B-parameter MoE video generator from Alibaba. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters27B

Active Params14B

ArchitectureMoE

ProviderAlibaba

Download Size240.5 GB

Community

Monthly Downloads4.7K

Likes480

Last Updated9 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

VBench

86.2

Overall Score

46.8CC

Benchmark45%

50.0

Popularity25%

50.0

Efficiency25%

33.3

Versatility5%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	9.1 GB
Acer Veriton GN100 AI MiniAcer	SS	9.1 GB
AMD Instinct MI300XAMD	SS	9.1 GB
AMD Instinct MI325XAMD	SS	9.1 GB
AMD Instinct MI355XAMD	SS	9.1 GB
AMD Radeon RX 7800 XTAMD	SS	9.1 GB
AMD Radeon RX 7900 XTAMD	SS	9.1 GB
AMD Radeon RX 7900 XTXAMD	SS	9.1 GB
AMD Radeon RX 9070AMD	SS	9.1 GB
AMD Radeon RX 9070 XTAMD	SS	9.1 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	9.1 GB
Apple M4Apple	SS	9.1 GB
Apple M4 Max (40-core GPU)Apple	SS	9.1 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	9.1 GB
Apple M5Apple	SS	9.1 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	9.1 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	9.1 GB
Apple Mac Mini (M1, 2020)Apple	SS	9.1 GB
Apple Mac Mini (M2, 2023)Apple	SS	9.1 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	9.1 GB
Apple Mac Mini (M4, 2024)Apple	SS	9.1 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	9.1 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	9.1 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	9.1 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	9.1 GB

Rows per page

Page 1 of 5

About This Model

Wan2.2-T2V-A14B is a text-to-video generation model from Alibaba, part of their Wan2.2 family. At 27B total parameters with 14B active in a Mixture-of-Experts (MoE) architecture, it occupies a unique position: it's the first open-source video generation model to use MoE routing, and it directly competes with closed-source platforms like Runway Gen-3 and Pika on output quality while remaining fully local and Apache 2.0 licensed.

The model generates 5-second videos at both 480P and 720P resolution from text prompts. What matters for practitioners is that the MoE architecture delivers better results per compute unit than dense models of similar size. Alibaba's own Wan-Bench 2.0 benchmarks show it surpassing leading commercial models across most evaluation dimensions — motion coherence, semantic alignment, and aesthetic quality.

This isn't a toy. It's a production-capable video generation model that runs on consumer hardware with appropriate quantization, and it's the strongest open-source option available for local text-to-video workloads as of mid-2025.

Architecture & Technical Details

Wan2.2-T2V-A14B uses a Mixture-of-Experts architecture with 27B total parameters, of which 14B are active during any single forward pass. The key innovation is SNR-governed dynamic expert routing: the model separates the denoising process across timesteps using specialized expert sub-networks.

The high-noise expert handles macro composition — the broad layout, subject placement, and scene structure during early denoising steps. The low-noise expert refines high-frequency detail — textures, fine edges, and temporal consistency — during later steps. This division of labor means each expert can specialize rather than a single network trying to handle both coarse and fine-grained generation simultaneously.

For inference, this translates directly to practical advantages. You get the output quality of a 27B model while only activating 14B parameters per step, reducing VRAM pressure and inference latency compared to a dense 27B model. The MoE architecture also enables better scaling: adding more experts increases model capacity without linearly increasing per-step compute.

The model uses Wan2.2's custom VAE with a 16×16×4 compression ratio, which is aggressive by video generation standards. This means the latent space is compact, reducing the memory footprint during diffusion sampling and enabling higher resolution output without proportionally increasing VRAM requirements.

Capabilities & Use Cases

Wan2.2-T2V-A14B takes text prompts and generates video directly — no image input required. The model supports cinematic style control through detailed prompt engineering, with the training data including labeled examples of lighting, composition, contrast, and color tone.

Concrete use cases:

Cinematic previsualization: Generate storyboards and shot concepts from script descriptions. The model handles complex motion sequences — camera pans, subject movement, and scene transitions — better than previous open-source alternatives.
Content production: Create B-roll, background footage, or social media video assets from text descriptions. The 5-second output length suits short-form content needs.
Iterative prototyping: Rapidly test visual concepts by adjusting prompts. The MoE architecture's efficiency means you can run multiple generation passes without prohibitive wait times.
Educational and training content: Generate illustrative video segments from procedural descriptions.

The model does not accept image or video inputs — it's text-only. For image-to-video workflows, Alibaba offers the I2V-A14B variant. The context window for text prompts is not officially specified, but practical usage suggests prompts of 50-200 tokens work well for coherent generation.

Running Wan2.2-T2V-A14B Locally

This is where the model's architecture pays off. The 14B active parameter count makes it feasible on consumer hardware, though you'll need to be strategic about quantization and memory management.

VRAM requirements by quantization:

FP16 (full precision): ~54GB VRAM minimum. Requires dual consumer GPUs or a single enterprise card (A6000, A100, H100). Not practical for most local setups.
Q8_0: ~28GB VRAM. Runs on a single RTX 4090 24GB with aggressive offloading, or comfortably on an RTX 6000 Ada 48GB. Expect some performance hit from offloading.
Q4_K_M (recommended): ~16GB VRAM. This is the sweet spot. Runs on a single RTX 4090 24GB with room to spare. Quality degradation is minimal for video generation — the MoE architecture preserves output fidelity better than dense models at equivalent quantization levels.
Q3_K_M: ~12GB VRAM. Fits on RTX 3080 12GB or RTX 4070 Ti. Expect noticeable quality drop, particularly in fine detail and motion coherence.
Q2_K: ~8GB VRAM. Runs on RTX 3060 12GB. Significant quality loss. Only use for testing or when hardware is constrained.

Hardware recommendations:

RTX 4090 24GB: Best consumer option. Run Q4_K_M quantization comfortably. Expect 5-10 seconds per generation step at 720P, with total generation time around 2-4 minutes per 5-second clip depending on sampling steps.
RTX 4080/4070 Ti Super 16GB: Q4_K_M with some offloading, or Q3_K_M for full on-device execution. Slower but functional.
Apple Silicon (M4 Max 64GB+): Can run Q4_K_M via MLX or similar frameworks. Performance is slower than NVIDIA equivalents but workable for non-production use.
Dual RTX 3090/4090: Run FP16 or Q8_0 with tensor parallelism. Significantly faster generation and higher output quality.

Expected performance:

At Q4_K_M on an RTX 4090, expect approximately 3-5 tokens per second in the diffusion model, with total generation time dominated by the number of sampling steps. 50-step sampling at 720P takes roughly 3-5 minutes. Lower step counts (20-30) reduce quality but cut generation time to 1-2 minutes.

Getting started:

The quickest path is ComfyUI with the Wan2.2 integration, which handles model loading, quantization, and workflow management. The official Wan2.2 GitHub repository provides multi-GPU inference scripts for higher-end setups. Diffusers integration is also available for Python-native workflows.

How It Compares

vs. Stable Video Diffusion (SVD): SVD is a dense 1.4B model that generates 14-25 frame clips at 576x1024. Wan2.2-T2V-A14B produces significantly better motion coherence, higher resolution, and longer clips. The tradeoff is hardware requirements: SVD runs on 8GB VRAM, Wan2.2 needs 16GB+ for practical use. Choose SVD for quick prototyping on limited hardware; choose Wan2.2 for production-quality output.

vs. CogVideoX-5B: CogVideoX uses a dense 5B transformer architecture. Wan2.2's MoE approach with 14B active parameters delivers noticeably better composition and detail at similar inference costs. CogVideoX has a longer context window for prompts and supports image-to-video natively. Wan2.2 wins on output quality; CogVideoX wins on prompt flexibility and lower minimum VRAM (~12GB vs ~16GB).

vs. closed-source platforms (Runway Gen-3, Pika): Wan2.2 matches or exceeds these on objective quality benchmarks while running entirely locally. The tradeoffs are convenience (no API, no UI) and generation speed (slower than cloud inference). For practitioners who need privacy, unlimited generation, or production pipelines that can't depend on external APIs, Wan2.2 is the strongest open-source alternative available.

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Alibaba

Wan2.2-T2V-A14B

27B total / 14B active MoE text-to-video model with SNR-governed dynamic expert routing. High-noise expert shapes macro composition; low-noise expert refines high-frequency detail.

27B paramsMoE

View on Hugging Face Official Page

Our Take

Best for: Open-source text to video workloads

A workable 27B-parameter MoE video generator from Alibaba. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters27B

Active Params14B

ArchitectureMoE

ProviderAlibaba

Download Size240.5 GB

Community

Monthly Downloads4.7K

Likes480

Last Updated9 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

VBench

86.2

Overall Score

46.8CC

Benchmark45%

50.0

Popularity25%

50.0

Efficiency25%

33.3

Versatility5%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	9.1 GB
Acer Veriton GN100 AI MiniAcer	SS	9.1 GB
AMD Instinct MI300XAMD	SS	9.1 GB
AMD Instinct MI325XAMD	SS	9.1 GB
AMD Instinct MI355XAMD	SS	9.1 GB
AMD Radeon RX 7800 XTAMD	SS	9.1 GB
AMD Radeon RX 7900 XTAMD	SS	9.1 GB
AMD Radeon RX 7900 XTXAMD	SS	9.1 GB
AMD Radeon RX 9070AMD	SS	9.1 GB
AMD Radeon RX 9070 XTAMD	SS	9.1 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	9.1 GB
Apple M4Apple	SS	9.1 GB
Apple M4 Max (40-core GPU)Apple	SS	9.1 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	9.1 GB
Apple M5Apple	SS	9.1 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	9.1 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	9.1 GB
Apple Mac Mini (M1, 2020)Apple	SS	9.1 GB
Apple Mac Mini (M2, 2023)Apple	SS	9.1 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	9.1 GB
Apple Mac Mini (M4, 2024)Apple	SS	9.1 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	9.1 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	9.1 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	9.1 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	9.1 GB

Rows per page

Page 1 of 5

About This Model

Architecture & Technical Details

Capabilities & Use Cases

Concrete use cases:

Cinematic previsualization: Generate storyboards and shot concepts from script descriptions. The model handles complex motion sequences — camera pans, subject movement, and scene transitions — better than previous open-source alternatives.
Content production: Create B-roll, background footage, or social media video assets from text descriptions. The 5-second output length suits short-form content needs.
Iterative prototyping: Rapidly test visual concepts by adjusting prompts. The MoE architecture's efficiency means you can run multiple generation passes without prohibitive wait times.
Educational and training content: Generate illustrative video segments from procedural descriptions.

Running Wan2.2-T2V-A14B Locally

This is where the model's architecture pays off. The 14B active parameter count makes it feasible on consumer hardware, though you'll need to be strategic about quantization and memory management.

VRAM requirements by quantization:

FP16 (full precision): ~54GB VRAM minimum. Requires dual consumer GPUs or a single enterprise card (A6000, A100, H100). Not practical for most local setups.
Q8_0: ~28GB VRAM. Runs on a single RTX 4090 24GB with aggressive offloading, or comfortably on an RTX 6000 Ada 48GB. Expect some performance hit from offloading.
Q4_K_M (recommended): ~16GB VRAM. This is the sweet spot. Runs on a single RTX 4090 24GB with room to spare. Quality degradation is minimal for video generation — the MoE architecture preserves output fidelity better than dense models at equivalent quantization levels.
Q3_K_M: ~12GB VRAM. Fits on RTX 3080 12GB or RTX 4070 Ti. Expect noticeable quality drop, particularly in fine detail and motion coherence.
Q2_K: ~8GB VRAM. Runs on RTX 3060 12GB. Significant quality loss. Only use for testing or when hardware is constrained.

Hardware recommendations:

RTX 4090 24GB: Best consumer option. Run Q4_K_M quantization comfortably. Expect 5-10 seconds per generation step at 720P, with total generation time around 2-4 minutes per 5-second clip depending on sampling steps.
RTX 4080/4070 Ti Super 16GB: Q4_K_M with some offloading, or Q3_K_M for full on-device execution. Slower but functional.
Apple Silicon (M4 Max 64GB+): Can run Q4_K_M via MLX or similar frameworks. Performance is slower than NVIDIA equivalents but workable for non-production use.
Dual RTX 3090/4090: Run FP16 or Q8_0 with tensor parallelism. Significantly faster generation and higher output quality.

Expected performance:

Getting started:

How It Compares

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.