Lightricks

LTX-2 19B

First major open-source foundation model generating fully synchronized 4K video and native audio in a single forward pass. 14B visual + 5B audio params; ~18× faster than Wan 2.2 14B on H100.

19B paramsDense

View on Hugging Face Official Page

Model Specifications

Parameters19B

ArchitectureDense

ProviderLightricks

Download Size455.3 GB

Community

Monthly Downloads828.1K

Likes1.7K

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

47.3CC

Benchmark45%

50.0

Popularity25%

85.0

Efficiency25%

0.0

Versatility5%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	12.1 GB
AMD Instinct MI300XAMD	SS	12.1 GB
AMD Instinct MI325XAMD	SS	12.1 GB
AMD Instinct MI355XAMD	SS	12.1 GB
AMD Radeon RX 7900 XTXAMD	SS	12.1 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	12.1 GB
Apple M4Apple	SS	12.1 GB
Apple M4 Max (40-core GPU)Apple	SS	12.1 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	12.1 GB
Apple M5Apple	SS	12.1 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	12.1 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	12.1 GB
Apple Mac Mini (M2, 2023)Apple	SS	12.1 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	12.1 GB
Apple Mac Mini (M4, 2024)Apple	SS	12.1 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	12.1 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	12.1 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	12.1 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	12.1 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	SS	12.1 GB
Apple Mac Studio (M3 Ultra, 2025)Apple	SS	12.1 GB
Apple Mac Studio (M4 Max, 2025)Apple	SS	12.1 GB
ASUS Ascent GX10 - 1TBASUS	SS	12.1 GB
ASUS Ascent GX10 - 2TBASUS	SS	12.1 GB
ASUS Ascent GX10 - 4TBASUS	SS	12.1 GB

Rows per page

Page 1 of 4

About This Model

Overview

LTX-2 19B is Lightricks' open-source audio-video foundation model that generates fully synchronized 4K video and native audio in a single forward pass. Released in January 2026, it represents the first production-ready open-weight model capable of joint video and audio generation without requiring separate post-production pipelines.

The model uses a dense 19B parameter architecture split asymmetrically: 14B parameters dedicated to visual generation and 5B to audio synthesis. This design allows the model to produce up to 20 seconds of 4K video at 50 frames per second with matching sound effects, dialogue, and ambient audio—all generated simultaneously and automatically synchronized.

Lightricks, the Israel-based company behind consumer apps like Facetune and Videoleap, released LTX-2 under an Apache 2.0-compatible community license. Commercial use is free for companies generating less than $10 million in annual revenue. The full model weights, training code, and documentation are publicly available on GitHub and HuggingFace.

Architecture & Technical Details

LTX-2 uses a Diffusion Transformer (DiT) architecture with two specialized processing streams that communicate through bidirectional cross-attention layers. The video stream handles spatial detail, motion consistency, and temporal coherence. The audio stream manages sound generation, dialogue timing, and environmental audio. This cross-attention mechanism ensures audio events align precisely with visual cues—when a door closes on screen, the sound occurs at the exact moment; when characters speak, lip movements sync with dialogue automatically.

The model processes inputs through modality-specific VAEs (Variational Autoencoders) that compress raw signals into efficient latent representations. This compression achieves a 1:192 ratio, allowing the model to handle high-resolution content without excessive memory requirements.

LTX-2 supports multiple generation modes through a unified architecture:

Text-to-video
Image-to-video
Video-to-video
Text-to-audio-video
Image-to-audio-video

The model ships with several checkpoint variants optimized for different use cases. The ltx-2-19b-dev variant is the full model in bf16 precision, suitable for fine-tuning and maximum quality. Quantized versions (fp8, fp4) reduce VRAM requirements for inference-only deployments. The ltx-2-19b-distilled checkpoint uses 8 inference steps with CFG=1, dramatically reducing generation time for applications where absolute maximum quality is less critical than throughput.

Spatial and temporal upscaler models enable multi-stage pipelines for higher resolution and framerate outputs. The x2 spatial upscaler operates on LTX latents to increase resolution, while the x2 temporal upscaler increases frames per second.

Capabilities & Use Cases

LTX-2 excels at generating short-form video content with synchronized audio in a single workflow. The primary use cases for local deployment include:

Content creation: Generate b-roll footage, product demonstrations, or atmospheric clips with matching ambient audio. The image-to-video mode is particularly useful for animating reference images into dynamic scenes.

Game development: Produce placeholder or prototype video sequences with sound effects and environmental audio. The 4K output resolution supports integration into higher-fidelity production pipelines.

Video prototyping: Quickly generate test footage with audio before committing to full production. The distilled checkpoint's fast inference enables iterative prototyping on consumer hardware.

Audio-visual research: The open weights and training code enable experimentation with fine-tuning, LoRA adaptation, and custom training pipelines. The ltx-2-19b-distilled-lora-384 checkpoint provides a lightweight entry point for parameter-efficient fine-tuning.

The model's text-to-video and image-to-video capabilities accept English prompts. Multi-language support is listed in the model card metadata, but performance characteristics for non-English inputs are not documented.

Running LTX-2 19B Locally

Hardware Requirements

LTX-2 19B requires substantial GPU memory for local inference. The exact VRAM requirements depend on the checkpoint variant and quantization level chosen.

|------------|-----------|-----------|----------|

| ltx-2-19b-dev | bf16 | 40-48GB | Fine-tuning, maximum quality |

| ltx-2-19b-dev-fp8 | fp8 | 24-32GB | High-quality inference on H100/A100 |

| ltx-2-19b-distilled | bf16 | 32-40GB | Fast inference, production use |

For consumer GPUs, the fp4 quantized checkpoint (ltx-2-19b-dev-fp4) is the most practical starting point. An RTX 4090 (24GB) can run this variant, though generation times will be longer than on datacenter hardware. The M4 Max (with unified memory configurations of 64GB or 128GB) provides another viable option for macOS workflows, though CUDA-based acceleration will generally offer better performance on equivalent hardware.

Recommended Quantization

For most users running LTX-2 19B on consumer or prosumer hardware, the ltx-2-19b-dev-fp4 checkpoint offers the best balance between quality and accessibility. The fp8 variant is preferable if you have access to an A100 or H100, as it preserves more quality while still reducing VRAM compared to bf16.

The distilled checkpoint (ltx-2-19b-distilled) with 8 inference steps provides the fastest path to generated output, at the cost of some visual fidelity. For applications where iteration speed matters more than absolute quality—prototyping, batch generation, or real-time previews—the distilled version is the recommended choice.

Performance Expectations

On an H100 SXM, LTX-2 19B generates video approximately 18× faster than Wan 2.2 14B. Specific tokens-per-second or generation-time figures vary significantly based on resolution, duration, and checkpoint selection. The distilled checkpoint with 8 steps will complete generation substantially faster than the full model requiring more denoising steps.

Integration Options

ComfyUI: The recommended path for most users. Install the LTXVideo nodes through ComfyUI Manager for a visual workflow interface. Lightricks maintains official documentation for this integration.

Diffusers: LTX-2 is supported in the HuggingFace Diffusers library for programmatic access. The two-stage generation pipeline is recommended for production quality.

Direct PyTorch: Clone the official repository and follow the installation instructions. Requirements include Python ≥3.12, CUDA >12.7, and PyTorch ~=2.7. The monorepo includes model definitions (ltx-core), pipelines (ltx-pipelines), and training capabilities (ltx-trainer).

Quick Start with Ollama

Ollama support for LTX-2 provides the fastest path to local experimentation. Once official support lands, running the model requires only a single command after downloading the model weights. Check the Ollama model library for availability and updated instructions.

How It Compares

LTX-2 19B vs. Wan 2.2 14B: Wan 2.2 is a video-only model without native audio generation. LTX-2's primary advantage is the unified audio-video pipeline that eliminates separate audio generation and synchronization workflows. On H100 hardware, LTX-2 is approximately 18× faster for comparable video quality. For projects requiring audio, LTX-2 is the clear choice. For video-only applications where audio is unnecessary, Wan 2.2 may offer simpler deployment.

LTX-2 19B vs. CogVideoX: CogVideoX is another open-source video generation option, but lacks native audio generation. The architecture differences mean each model has distinct strengths in motion handling, prompt adherence, and visual quality. LTX-2's DiT-based design with cross-modal attention provides tighter audio-visual synchronization than models that generate audio separately.

The choice between these models depends on your requirements. If you need synchronized audio in a single pass, LTX-2 19B is the only open-source option at this capability level. If you require video-only generation and have specific architecture preferences, CogVideoX or Wan 2.2 may be worth evaluating for your particular use case.

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

19B

Lightricks

LTX-2 19B

First major open-source foundation model generating fully synchronized 4K video and native audio in a single forward pass. 14B visual + 5B audio params; ~18× faster than Wan 2.2 14B on H100.

19B paramsDense

View on Hugging Face Official Page

Model Specifications

Parameters19B

ArchitectureDense

ProviderLightricks

Download Size455.3 GB

Community

Monthly Downloads828.1K

Likes1.7K

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

47.3CC

Benchmark45%

50.0

Popularity25%

85.0

Efficiency25%

0.0

Versatility5%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	12.1 GB
AMD Instinct MI300XAMD	SS	12.1 GB
AMD Instinct MI325XAMD	SS	12.1 GB
AMD Instinct MI355XAMD	SS	12.1 GB
AMD Radeon RX 7900 XTXAMD	SS	12.1 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	12.1 GB
Apple M4Apple	SS	12.1 GB
Apple M4 Max (40-core GPU)Apple	SS	12.1 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	12.1 GB
Apple M5Apple	SS	12.1 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	12.1 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	12.1 GB
Apple Mac Mini (M2, 2023)Apple	SS	12.1 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	12.1 GB
Apple Mac Mini (M4, 2024)Apple	SS	12.1 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	12.1 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	12.1 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	12.1 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	12.1 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	SS	12.1 GB
Apple Mac Studio (M3 Ultra, 2025)Apple	SS	12.1 GB
Apple Mac Studio (M4 Max, 2025)Apple	SS	12.1 GB
ASUS Ascent GX10 - 1TBASUS	SS	12.1 GB
ASUS Ascent GX10 - 2TBASUS	SS	12.1 GB
ASUS Ascent GX10 - 4TBASUS	SS	12.1 GB

Rows per page

Page 1 of 4

About This Model

Overview

Architecture & Technical Details

LTX-2 supports multiple generation modes through a unified architecture:

Text-to-video
Image-to-video
Video-to-video
Text-to-audio-video
Image-to-audio-video

Capabilities & Use Cases

LTX-2 excels at generating short-form video content with synchronized audio in a single workflow. The primary use cases for local deployment include:

Video prototyping: Quickly generate test footage with audio before committing to full production. The distilled checkpoint's fast inference enables iterative prototyping on consumer hardware.

Running LTX-2 19B Locally

Hardware Requirements

LTX-2 19B requires substantial GPU memory for local inference. The exact VRAM requirements depend on the checkpoint variant and quantization level chosen.

|------------|-----------|-----------|----------|

| ltx-2-19b-dev | bf16 | 40-48GB | Fine-tuning, maximum quality |

| ltx-2-19b-dev-fp8 | fp8 | 24-32GB | High-quality inference on H100/A100 |

| ltx-2-19b-distilled | bf16 | 32-40GB | Fast inference, production use |

Recommended Quantization

Performance Expectations

Integration Options

ComfyUI: The recommended path for most users. Install the LTXVideo nodes through ComfyUI Manager for a visual workflow interface. Lightricks maintains official documentation for this integration.

Diffusers: LTX-2 is supported in the HuggingFace Diffusers library for programmatic access. The two-stage generation pipeline is recommended for production quality.

Quick Start with Ollama

How It Compares

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.