9B parameter distilled rectified flow transformer paired with an 8B Qwen3 text embedder. Sub-second inference at ~19.6GB VRAM.
Access model weights, configuration files, and documentation.
No benchmark data available for this model yet.
See which devices can run this model and at what quality level.
FLUX.2 [klein] 9B is a high-speed, distilled rectified flow transformer developed by Black Forest Labs. It occupies a strategic middle ground in the FLUX ecosystem, offering a significant quality uplift over the lightweight 4B variant while maintaining the "sub-second" inference speeds that define the Klein family. Unlike the massive 32B FLUX.2 Dev, the 9B model is specifically engineered for local deployment on high-end consumer hardware, providing professional-grade image synthesis and text rendering without the latency typical of foundation models.
This model is a direct competitor to mid-sized diffusion models like Stable Diffusion 3.5 Medium, but it distinguishes itself through its step-distilled architecture. It is designed for practitioners who require real-time or near-real-time generation for interactive applications, rapid prototyping, or local creative workflows where waiting 15–30 seconds for a single generation is unacceptable.
The FLUX.2 [klein] 9B architecture is a dense rectified flow transformer consisting of 9 billion parameters. It is paired with an 8B Qwen3 text embedder, which provides the model with advanced linguistic comprehension and superior text-within-image rendering compared to previous generations.
The "distilled" version of this model uses 4-step guidance distillation to achieve its performance targets. By reducing the number of sampling steps required to reach convergence, the model can generate high-fidelity 1024x1024 images in as few as four iterations. For developers, this means the bottleneck shifts from compute cycles to VRAM throughput.
The model utilizes a unified architecture that handles three distinct tasks within a single weight set:
Because the model is dense rather than utilizing a Mixture of Experts (MoE) approach, every parameter is active during inference. This results in highly predictable memory usage and consistent performance across different hardware configurations.
FLUX.2 [klein] 9B excels in scenarios where visual fidelity cannot be sacrificed for speed. While the 4B version is faster, the 9B variant shows marked improvements in complex compositional logic and the accuracy of rendered text.
The primary use case for the 9B model is local "live" generation. With inference times around 2 seconds on modern consumer GPUs, designers can adjust prompts and see results almost instantly. This is ideal for UI/UX mockups, character design, and concept art where the creative process is highly iterative.
Thanks to its unified architecture, FLUX.2 [klein] 9B is exceptionally capable at image-to-image tasks. Practitioners can use it for high-fidelity inpainting or style consistent editing without needing to load separate specialized models or ControlNets.
The integration of the Qwen3 text embedder makes this one of the most capable 9B models for rendering legible text. It is suitable for creating social media assets, posters, and technical diagrams where accurate spelling and placement within the image are required.
To run FLUX.2 [klein] 9B locally, the primary constraint is VRAM. The distilled version of the model requires approximately 19.6 GB of VRAM for inference at FP16/BF16 precision. When accounting for overhead from the operating system and the text embedder, this makes 24GB VRAM the functional minimum for unquantized local use.
For users who want to fit the model into smaller VRAM footprints or increase speed, quantization is essential.
The quickest way to get started is via Ollama or ComfyUI. ComfyUI provides the most granular control over the 4-step distillation process and allows for complex workflows involving the multi-reference fusion capabilities. For those looking for a "one-click" experience, Ollama support for the Klein family allows for easy management of the weights and local API access.
The 4B variant is roughly 40% faster and can run on 12GB (or even 8GB with quantization) VRAM. However, the 9B model is visibly sharper and handles complex prompts—specifically those involving multiple subjects or specific spatial relationships—with much higher reliability. If your hardware supports 24GB VRAM, the 9B is the superior choice for production work.
SD 3.5 Medium is a strong competitor in the mid-sized category. While SD 3.5 may offer more community-driven fine-tunes and LoRAs currently, FLUX.2 [klein] 9B wins on raw inference speed due to its 4-step distillation. The text rendering on the 9B Klein is also generally considered more robust due to the larger Qwen3 embedder.
The Dev model is significantly more powerful but requires 80GB+ VRAM for unquantized inference, making it inaccessible for most local consumer setups without heavy quantization. FLUX.2 [klein] 9B provides roughly 80% of the visual quality of the Dev model at a fraction of the hardware cost and many times the speed.
Practitioners should note the FLUX Non-Commercial License. While the weights are open and available for local experimentation and fine-tuning, commercial deployment requires a separate agreement with Black Forest Labs. For a fully open-source alternative with an Apache 2.0 license, the 4B variant is the only option within the Klein family.