32B parameter rectified flow transformer for text-to-image generation and editing. Supports up to 4MP resolution and multi-reference combinations in a single checkpoint.
Access model weights, configuration files, and documentation.
No benchmark data available for this model yet.
See which devices can run this model and at what quality level.
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
| SS | 20.1 GB | ||
GIGABYTE AI TOP ATOMGigabyte | SS | 20.1 GB | |
Gigabyte W775-V10-L01Gigabyte | SS | 20.1 GB | |
Google Cloud TPU v5pGoogle | SS | 20.1 GB |
FLUX.2 [dev] is a state-of-the-art 32B parameter rectified flow transformer designed by Black Forest Labs for high-fidelity text-to-image generation and complex image editing. Positioned as the open-weight successor to the original FLUX.1 series, this model bridges the gap between research-grade weights and production-ready output. It competes directly with top-tier proprietary models like Midjourney v6 and DALL-E 3, but with the distinct advantage of being runnable on local consumer hardware.
Unlike standard diffusion models, FLUX.2 [dev] utilizes a rectified flow architecture that excels at spatial reasoning and prompt adherence. It is specifically engineered for practitioners who require precise control over composition, typography, and photorealism without relying on cloud-based APIs. The "dev" variant is a guidance-distilled version of the model, optimized to provide high-quality results in fewer steps than the "pro" versions while maintaining the full 32B parameter density.
The core of FLUX.2 [dev] is a 32B dense parameter rectified flow transformer. This architecture represents a shift from traditional U-Net structures, offering better scaling laws and more stable training for high-resolution outputs.
The model uses a flow-matching framework, which simplifies the path between noise and the final image. This results in cleaner textures and more accurate object placement compared to older latent diffusion techniques. Because it is a dense 32B model, every parameter is active during every inference step. While this demands more VRAM than a Mixture-of-Experts (MoE) or a smaller 12B model, it provides a level of detail and "understanding" of complex prompts that smaller models cannot match.
FLUX.2 [dev] incorporates a vision-language model (VLM) with a context window of approximately 32,000 tokens. In practical terms, this allows the model to process extremely long, descriptive prompts and complex layout instructions. It can interpret multi-paragraph descriptions of a scene, maintaining coherence across various elements like lighting, specific character attire, and background details.
A standout technical feature is the native support for multi-reference combinations within a single checkpoint. Traditional models often require LoRAs or ControlNets to maintain character or style consistency. FLUX.2 [dev] can ingest up to 10 reference images simultaneously, allowing users to specify a character from one image, a clothing style from another, and a lighting environment from a third, all without additional fine-tuning.
FLUX.2 [dev] is built for professional creative workflows where "good enough" is insufficient. It excels in areas where previous open-weight models typically struggled:
To run FLUX.2 [dev] locally, your primary bottleneck will be VRAM. As a 32B dense model, the memory footprint is significant, especially when factoring in the VAE and the T5/CLIP encoders.
For most practitioners, Q4_K_M (4-bit) is the "sweet spot." It reduces the model size enough to fit on 16GB-24GB cards while retaining nearly 98% of the generation quality of the full-precision weights. If you are doing professional photography-style work where skin pores and fabric textures are critical, FP8 or Q8_0 is recommended on 24GB+ hardware.
On an RTX 4090 using optimized environments like ComfyUI or Forge, you can expect a 1024x1024 image in approximately 10–15 seconds using 20–25 steps. Native 4MP generations will take significantly longer, often 45–90 seconds depending on the complexity and quantization level.
The quickest way to get started is via Ollama (for CLI/backend integration) or ComfyUI, which has robust support for the FLUX.2 architecture and its multi-reference inputs.
FLUX.2 [dev] sits in a unique position between lightweight models and massive enterprise systems.
For developers building local AI tools, FLUX.2 [dev] is currently the benchmark for what is possible with open-weight image generation on consumer-grade GPUs.