Black Forest Labs

FLUX.2 [klein] 9B

9B parameter distilled rectified flow transformer paired with an 8B Qwen3 text embedder. Sub-second inference at ~19.6GB VRAM.

9B paramsDense

View on Hugging Face Official Page

Model Specifications

Parameters9B

ArchitectureDense

ProviderBlack Forest Labs

Download Size52.9 GB

Community

Monthly Downloads132.3K

Likes665

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

FLUX Non-Commercial License

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

58.1BB

Benchmark45%

50.0

Popularity25%

66.3

Efficiency20%

62.5

Versatility10%

65.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	6.0 GB
AMD Instinct MI300XAMD	SS	6.0 GB
AMD Instinct MI325XAMD	SS	6.0 GB
AMD Instinct MI355XAMD	SS	6.0 GB
AMD Radeon RX 7700 XTAMD	SS	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	6.0 GB
AMD Radeon RX 9070AMD	SS	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	6.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.0 GB
Apple M4Apple	SS	6.0 GB
Apple M4 Max (40-core GPU)Apple	SS	6.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple M5Apple	SS	6.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	6.0 GB

Rows per page

Page 1 of 4

About This Model

FLUX.2 [klein] 9B is a high-speed, distilled rectified flow transformer developed by Black Forest Labs. It occupies a strategic middle ground in the FLUX ecosystem, offering a significant quality uplift over the lightweight 4B variant while maintaining the "sub-second" inference speeds that define the Klein family. Unlike the massive 32B FLUX.2 Dev, the 9B model is specifically engineered for local deployment on high-end consumer hardware, providing professional-grade image synthesis and text rendering without the latency typical of foundation models.

This model is a direct competitor to mid-sized diffusion models like Stable Diffusion 3.5 Medium, but it distinguishes itself through its step-distilled architecture. It is designed for practitioners who require real-time or near-real-time generation for interactive applications, rapid prototyping, or local creative workflows where waiting 15–30 seconds for a single generation is unacceptable.

Architecture & Technical Details

The FLUX.2 [klein] 9B architecture is a dense rectified flow transformer consisting of 9 billion parameters. It is paired with an 8B Qwen3 text embedder, which provides the model with advanced linguistic comprehension and superior text-within-image rendering compared to previous generations.

The "distilled" version of this model uses 4-step guidance distillation to achieve its performance targets. By reducing the number of sampling steps required to reach convergence, the model can generate high-fidelity 1024x1024 images in as few as four iterations. For developers, this means the bottleneck shifts from compute cycles to VRAM throughput.

The model utilizes a unified architecture that handles three distinct tasks within a single weight set:

Text-to-Image: Standard generation from natural language prompts.
Image-to-Image (Editing): Localized modifications and style transfers.
Multi-Reference Image Fusion: Combining elements from multiple source images into a single coherent output.

Because the model is dense rather than utilizing a Mixture of Experts (MoE) approach, every parameter is active during inference. This results in highly predictable memory usage and consistent performance across different hardware configurations.

Capabilities & Use Cases

FLUX.2 [klein] 9B excels in scenarios where visual fidelity cannot be sacrificed for speed. While the 4B version is faster, the 9B variant shows marked improvements in complex compositional logic and the accuracy of rendered text.

High-Speed Iterative Design

The primary use case for the 9B model is local "live" generation. With inference times around 2 seconds on modern consumer GPUs, designers can adjust prompts and see results almost instantly. This is ideal for UI/UX mockups, character design, and concept art where the creative process is highly iterative.

Local Image Editing and Inpainting

Thanks to its unified architecture, FLUX.2 [klein] 9B is exceptionally capable at image-to-image tasks. Practitioners can use it for high-fidelity inpainting or style consistent editing without needing to load separate specialized models or ControlNets.

Text-Heavy Visuals

The integration of the Qwen3 text embedder makes this one of the most capable 9B models for rendering legible text. It is suitable for creating social media assets, posters, and technical diagrams where accurate spelling and placement within the image are required.

Running FLUX.2 [klein] 9B Locally

To run FLUX.2 [klein] 9B locally, the primary constraint is VRAM. The distilled version of the model requires approximately 19.6 GB of VRAM for inference at FP16/BF16 precision. When accounting for overhead from the operating system and the text embedder, this makes 24GB VRAM the functional minimum for unquantized local use.

Hardware Requirements

NVIDIA RTX 4090 / 3090 (24GB): The gold standard for this model. You can run the distilled FP16 version with enough headroom for the 8B text embedder. Expect inference times between 2 and 3 seconds.
NVIDIA RTX 5090: On flagship Blackwell hardware, the model achieves its peak potential with inference times approaching 2 seconds or less.
Apple Silicon (M4 Max / M2 Ultra): With unified memory, Macs with 32GB or more of RAM can run this model comfortably, though generation speeds will be slower than dedicated NVIDIA hardware.
NVIDIA RTX 4080 (16GB): Running the 9B model on a 16GB card requires quantization (4-bit or 8-bit) or offloading parts of the model to system RAM, which will significantly degrade performance.

Recommended Quantization

For users who want to fit the model into smaller VRAM footprints or increase speed, quantization is essential.

Q8_0 or Q6_K: Maintains near-original quality while reducing the VRAM requirement to roughly 12–14 GB, allowing it to run on 16GB cards like the RTX 4080 or 4070 Ti Super.
Q4_K_M: The most popular choice for general use. It drops the VRAM requirement significantly (approx. 8–10 GB) with a minor hit to fine detail. This is the best quantization for FLUX.2 [klein] 9B if you are running other applications simultaneously.

Deployment Options

The quickest way to get started is via Ollama or ComfyUI. ComfyUI provides the most granular control over the 4-step distillation process and allows for complex workflows involving the multi-reference fusion capabilities. For those looking for a "one-click" experience, Ollama support for the Klein family allows for easy management of the weights and local API access.

How It Compares

FLUX.2 [klein] 9B vs. FLUX.2 [klein] 4B

The 4B variant is roughly 40% faster and can run on 12GB (or even 8GB with quantization) VRAM. However, the 9B model is visibly sharper and handles complex prompts—specifically those involving multiple subjects or specific spatial relationships—with much higher reliability. If your hardware supports 24GB VRAM, the 9B is the superior choice for production work.

FLUX.2 [klein] 9B vs. Stable Diffusion 3.5 Medium

SD 3.5 Medium is a strong competitor in the mid-sized category. While SD 3.5 may offer more community-driven fine-tunes and LoRAs currently, FLUX.2 [klein] 9B wins on raw inference speed due to its 4-step distillation. The text rendering on the 9B Klein is also generally considered more robust due to the larger Qwen3 embedder.

FLUX.2 [klein] 9B vs. FLUX.2 Dev (32B)

The Dev model is significantly more powerful but requires 80GB+ VRAM for unquantized inference, making it inaccessible for most local consumer setups without heavy quantization. FLUX.2 [klein] 9B provides roughly 80% of the visual quality of the Dev model at a fraction of the hardware cost and many times the speed.

Practitioners should note the FLUX Non-Commercial License. While the weights are open and available for local experimentation and fine-tuning, commercial deployment requires a separate agreement with Black Forest Labs. For a fully open-source alternative with an Apache 2.0 license, the 4B variant is the only option within the Klein family.

Related Models

Black Forest Labs

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

Black Forest Labs

FLUX.2 [klein] 9B

9B parameter distilled rectified flow transformer paired with an 8B Qwen3 text embedder. Sub-second inference at ~19.6GB VRAM.

9B paramsDense

View on Hugging Face Official Page

Model Specifications

Parameters9B

ArchitectureDense

ProviderBlack Forest Labs

Download Size52.9 GB

Community

Monthly Downloads132.3K

Likes665

Last Updated2 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

FLUX Non-Commercial License

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

58.1BB

Benchmark45%

50.0

Popularity25%

66.3

Efficiency20%

62.5

Versatility10%

65.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	6.0 GB
AMD Instinct MI300XAMD	SS	6.0 GB
AMD Instinct MI325XAMD	SS	6.0 GB
AMD Instinct MI355XAMD	SS	6.0 GB
AMD Radeon RX 7700 XTAMD	SS	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	6.0 GB
AMD Radeon RX 9070AMD	SS	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	6.0 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.0 GB
Apple M4Apple	SS	6.0 GB
Apple M4 Max (40-core GPU)Apple	SS	6.0 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple M5Apple	SS	6.0 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.0 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.0 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.0 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.0 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.0 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.0 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.0 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	6.0 GB

Rows per page

Page 1 of 4

About This Model

Architecture & Technical Details

The model utilizes a unified architecture that handles three distinct tasks within a single weight set:

Text-to-Image: Standard generation from natural language prompts.
Image-to-Image (Editing): Localized modifications and style transfers.
Multi-Reference Image Fusion: Combining elements from multiple source images into a single coherent output.

Capabilities & Use Cases

High-Speed Iterative Design

Local Image Editing and Inpainting

Text-Heavy Visuals

Running FLUX.2 [klein] 9B Locally

Hardware Requirements

NVIDIA RTX 4090 / 3090 (24GB): The gold standard for this model. You can run the distilled FP16 version with enough headroom for the 8B text embedder. Expect inference times between 2 and 3 seconds.
NVIDIA RTX 5090: On flagship Blackwell hardware, the model achieves its peak potential with inference times approaching 2 seconds or less.
Apple Silicon (M4 Max / M2 Ultra): With unified memory, Macs with 32GB or more of RAM can run this model comfortably, though generation speeds will be slower than dedicated NVIDIA hardware.
NVIDIA RTX 4080 (16GB): Running the 9B model on a 16GB card requires quantization (4-bit or 8-bit) or offloading parts of the model to system RAM, which will significantly degrade performance.

Recommended Quantization

For users who want to fit the model into smaller VRAM footprints or increase speed, quantization is essential.

Q8_0 or Q6_K: Maintains near-original quality while reducing the VRAM requirement to roughly 12–14 GB, allowing it to run on 16GB cards like the RTX 4080 or 4070 Ti Super.
Q4_K_M: The most popular choice for general use. It drops the VRAM requirement significantly (approx. 8–10 GB) with a minor hit to fine detail. This is the best quantization for FLUX.2 [klein] 9B if you are running other applications simultaneously.