Black Forest Labs

FLUX.2 [klein] 4B

4B parameter highly distilled flow transformer optimized for interactive applications. Runs in <10GB VRAM with 4-step inference.

4B paramsDense

Official Page

Model Specifications

Parameters4B

ArchitectureDense

ProviderBlack Forest Labs

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

44.3CC

Benchmark45%

50.0

Popularity25%

0.0

Efficiency20%

81.3

Versatility10%

55.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	3.0 GB
AMD Instinct MI300XAMD	SS	3.0 GB
AMD Instinct MI325XAMD	SS	3.0 GB
AMD Instinct MI355XAMD	SS	3.0 GB
AMD Radeon RX 7600 8GBAMD	SS	3.0 GB
AMD Radeon RX 7700 XTAMD	SS	3.0 GB
AMD Radeon RX 7800 XTAMD	SS	3.0 GB
AMD Radeon RX 7900 XTAMD	SS	3.0 GB
AMD Radeon RX 7900 XTXAMD	SS	3.0 GB
AMD Radeon RX 9070AMD	SS	3.0 GB
AMD Radeon RX 9070 XTAMD	SS	3.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	3.0 GB
Apple M4Apple	SS	3.0 GB
Apple M4 Max (40-core GPU)Apple	SS	3.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	3.0 GB
Apple M5Apple	SS	3.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	3.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	3.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	3.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	3.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	3.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	3.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	3.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	3.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	3.0 GB

Rows per page

Page 1 of 4

About This Model

FLUX.2 [klein] 4B is the highly efficient, distilled entry point into Black Forest Labs' second-generation image generation ecosystem. Unlike its larger predecessors, the 4B variant is purpose-built for low-latency, interactive applications where sub-second inference is required. By utilizing a rectified flow transformer architecture and a 4-step distillation process, it bridges the gap between high-fidelity diffusion models and real-time performance. For developers, the most significant shift is the license: FLUX.2 [klein] 4B ships under Apache 2.0, providing a fully open path for commercial deployment and fine-tuning without the restrictive terms found in earlier FLUX.1 non-commercial releases.

This model occupies a unique niche in the local AI hardware landscape. It is small enough to run on mid-range consumer GPUs while maintaining the structural coherence and prompt adherence that Black Forest Labs is known for. It competes directly with models like SDXL Turbo and the smaller AuraFlow variants, but offers a more modern unified architecture that handles text-to-image, image-to-image, and multi-reference editing within a single weight set.

Architecture and Technical Details

The FLUX.2 [klein] 4B architecture is a dense rectified flow transformer consisting of 4 billion parameters. While it is significantly smaller than the 9B or 12B variants in the family, it retains the same core architectural logic, coupling the transformer with Mistral-3's 24B vision-language model (VLM) for advanced world knowledge and complex prompt comprehension.

The "distilled" nature of the 4B model is its defining technical characteristic. It is optimized for a 4-step inference cycle, which dramatically reduces the computational overhead compared to standard diffusion models that require 20 to 50 steps. In practical terms, this means the model can generate a high-quality image in roughly 1.2 seconds on an NVIDIA RTX 5090 and under 0.3 seconds on enterprise-grade H100 or GB200 hardware. Because it is a dense model rather than a Mixture of Experts (MoE), its VRAM footprint is static and predictable, making it ideal for edge deployments and fixed-resource environments.

Capabilities and Practical Use Cases

FLUX.2 [klein] 4B is designed for speed and versatility. It excels in "human-in-the-loop" workflows where immediate feedback is necessary. Because it supports multi-reference editing and image-to-image tasks natively, it is not just a generator but a comprehensive tool for local asset pipelines.

Interactive UI/UX Prototyping

The sub-second inference speed allows developers to build "generate-as-you-type" interfaces. This is particularly useful for rapid prototyping or creative brainstorming tools where the latency of a cloud API or a larger local model would break the user's flow.

Edge and Local Production

With its Apache 2.0 license, this model is the primary candidate for local software integrations. Whether it is a photo editing plugin or a local game asset generator, the 4B parameter size ensures that the application remains responsive even when sharing VRAM with other processes.

Fine-Tuning for Specific Styles

The base version of the 4B model is specifically optimized for fine-tuning. Because of the smaller parameter count, practitioners can train LoRAs or perform full-parameter fine-tuning on consumer hardware (24GB VRAM) much faster and more cheaply than they could with the 9B or 12B versions.

Running FLUX.2 [klein] 4B Locally

To run FLUX.2 [klein] 4B locally, the primary constraint is VRAM. While the model is compact, the unified architecture—including the VLM and the T5 text encoder—requires careful memory management.

Hardware Requirements and VRAM

Minimum VRAM (Distilled): 8.4 GB. This allows the model to run on cards like the RTX 4070 or RTX 3080 10GB, though you will have very little headroom for system overhead.
Recommended VRAM: 12GB to 16GB. At this level, you can run the model comfortably alongside a browser and other development tools.
CPU Offloading: If you have 16GB of system RAM but a lower-tier GPU, you can use CPU offloading to run the model with approximately 13GB of total system/video memory, though this will significantly increase inference times.

Performance Expectations

On an NVIDIA RTX 4090, you can expect the distilled 4-step version to produce images in approximately 1.5 to 2 seconds. On the Apple Silicon side, an M3 Max or M4 Max with 64GB of Unified Memory provides a seamless experience, handling the weights and the KV cache without aggressive swapping.

Software and Quantization

The quickest way to get started is via the diffusers library or ComfyUI. For those looking to maximize efficiency, Q4_K_M GGUF quantization is the recommended "sweet spot," reducing the VRAM footprint to roughly 6-7 GB for the weights alone with negligible loss in image quality. If you are using a card with 8GB VRAM, quantization is mandatory to avoid OOM (Out of Memory) errors.

How It Compares

When evaluating FLUX.2 [klein] 4B, it is most often compared to Stable Diffusion XL (SDXL) Turbo and SD3.5 Medium.

vs. SDXL Turbo: FLUX.2 [klein] 4B generally offers superior prompt adherence and text rendering. While SDXL Turbo is also fast, it often struggles with complex spatial relationships and fine details that the FLUX.2 architecture handles more gracefully. However, SDXL has a larger ecosystem of community LoRAs.
vs. SD3.5 Medium: SD3.5 Medium is a larger model (approx. 2.5B-3B parameters depending on configuration) but does not feature the same 4-step distillation out of the box. FLUX.2 [klein] 4B is significantly faster in "real-time" scenarios, though SD3.5 may offer different stylistic nuances.

The choice to run FLUX.2 [klein] 4B locally usually comes down to the need for speed and the Apache 2.0 license. If you need a model that you can legally bake into a commercial product and run on a mid-range consumer GPU with sub-2-second latency, this is currently the industry standard.

Related Models

Black Forest Labs

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

Black Forest Labs

FLUX.2 [klein] 4B

4B parameter highly distilled flow transformer optimized for interactive applications. Runs in <10GB VRAM with 4-step inference.

4B paramsDense

Official Page

Model Specifications

Parameters4B

ArchitectureDense

ProviderBlack Forest Labs

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

44.3CC

Benchmark45%

50.0

Popularity25%

0.0

Efficiency20%

81.3

Versatility10%

55.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	3.0 GB
AMD Instinct MI300XAMD	SS	3.0 GB
AMD Instinct MI325XAMD	SS	3.0 GB
AMD Instinct MI355XAMD	SS	3.0 GB
AMD Radeon RX 7600 8GBAMD	SS	3.0 GB
AMD Radeon RX 7700 XTAMD	SS	3.0 GB
AMD Radeon RX 7800 XTAMD	SS	3.0 GB
AMD Radeon RX 7900 XTAMD	SS	3.0 GB
AMD Radeon RX 7900 XTXAMD	SS	3.0 GB
AMD Radeon RX 9070AMD	SS	3.0 GB
AMD Radeon RX 9070 XTAMD	SS	3.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	3.0 GB
Apple M4Apple	SS	3.0 GB
Apple M4 Max (40-core GPU)Apple	SS	3.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	3.0 GB
Apple M5Apple	SS	3.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	3.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	3.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	3.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	3.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	3.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	3.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	3.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	3.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	3.0 GB

Rows per page

Page 1 of 4

About This Model

Architecture and Technical Details

Capabilities and Practical Use Cases

Interactive UI/UX Prototyping

Edge and Local Production

Fine-Tuning for Specific Styles

Running FLUX.2 [klein] 4B Locally

Hardware Requirements and VRAM

Minimum VRAM (Distilled): 8.4 GB. This allows the model to run on cards like the RTX 4070 or RTX 3080 10GB, though you will have very little headroom for system overhead.
Recommended VRAM: 12GB to 16GB. At this level, you can run the model comfortably alongside a browser and other development tools.
CPU Offloading: If you have 16GB of system RAM but a lower-tier GPU, you can use CPU offloading to run the model with approximately 13GB of total system/video memory, though this will significantly increase inference times.

Performance Expectations

Software and Quantization

How It Compares

When evaluating FLUX.2 [klein] 4B, it is most often compared to Stable Diffusion XL (SDXL) Turbo and SD3.5 Medium.

vs. SDXL Turbo: FLUX.2 [klein] 4B generally offers superior prompt adherence and text rendering. While SDXL Turbo is also fast, it often struggles with complex spatial relationships and fine details that the FLUX.2 architecture handles more gracefully. However, SDXL has a larger ecosystem of community LoRAs.
vs. SD3.5 Medium: SD3.5 Medium is a larger model (approx. 2.5B-3B parameters depending on configuration) but does not feature the same 4-step distillation out of the box. FLUX.2 [klein] 4B is significantly faster in "real-time" scenarios, though SD3.5 may offer different stylistic nuances.