Black Forest Labs

FLUX.1 Kontext [dev]

12B parameter rectified flow transformer optimized for iterative, context-aware editing. Dual image-text input with strong character consistency across scenes; TensorRT-optimized for Blackwell.

12B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source image to image workloads

A workable 12B-parameter dense image generator from Black Forest Labs. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters12B

ArchitectureDense

ProviderBlack Forest Labs

Download Size67.1 GB

Community

Monthly Downloads115.3K

Likes2.7K

Last Updated5 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

FLUX Non-Commercial License

Performance & Scoring

Benchmarks

Image Edit Arena

40.0

MBA Open Score

47.7CC

Benchmark45%

40.0

Popularity25%

61.9

Efficiency20%

43.8

Versatility10%

55.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	7.9 GB
Acer Veriton GN100 AI MiniAcer	SS	7.9 GB
AMD Instinct MI300XAMD	SS	7.9 GB
AMD Instinct MI325XAMD	SS	7.9 GB
AMD Instinct MI355XAMD	SS	7.9 GB
AMD Radeon RX 7800 XTAMD	SS	7.9 GB
AMD Radeon RX 7900 XTAMD	SS	7.9 GB
AMD Radeon RX 7900 XTXAMD	SS	7.9 GB
AMD Radeon RX 9070AMD	SS	7.9 GB
AMD Radeon RX 9070 XTAMD	SS	7.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	7.9 GB
Apple M4Apple	SS	7.9 GB
Apple M4 Max (40-core GPU)Apple	SS	7.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	7.9 GB
Apple M5Apple	SS	7.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	7.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	7.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	7.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	7.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	7.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	7.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	7.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	7.9 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	7.9 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	7.9 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 8 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 3080Vast.ai · Spot · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 3080Vast.ai · On-Demand · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5090Vast.ai · Spot · 32 GB VRAM	$0.11
NVIDIA GeForce RTX 5070 TiVast.ai · On-Demand · 16 GB VRAM	$0.12

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

FLUX.1 Kontext [dev] is a 12B parameter rectified flow transformer purpose-built for high-fidelity image editing. Developed by Black Forest Labs, this model bridges the gap between text-to-image generation and precise manipulation, offering a local, open-weight alternative to proprietary editing tools. Unlike standard diffusion models that struggle with maintaining identity or scene consistency during modifications, Kontext is optimized for iterative, context-aware editing and character preservation across varying environments.

As the developer-tier version of the Kontext family, it provides the same architectural backbone as the [pro] version but is licensed for non-commercial research and local development. It occupies a unique niche in the 12B parameter space, specifically targeting practitioners who need to perform complex global or local edits—such as changing a character's clothing, altering background elements, or adjusting lighting—without losing the structural integrity of the original subject.

Architecture and Technical Details

The model utilizes a dense 12B parameter architecture based on the rectified flow transformer framework. This design is inherited from the original FLUX.1 [dev] lineage but is fine-tuned specifically for dual image-text inputs. Where standard models take a text prompt and noise to generate an image, Kontext accepts an existing image as a foundational "context" alongside text instructions, allowing for a more directed denoising process.

A significant technical highlight is the model's optimization for the NVIDIA Blackwell architecture. Black Forest Labs collaborated with NVIDIA to produce TensorRT-optimized weights, including BF16, FP8, and even FP4 variants. These optimizations allow the model to leverage Blackwell’s specific hardware accelerators, significantly reducing memory overhead and increasing inference speed. For users on standard consumer hardware, the model remains compatible with standard FLUX.1 inference pipelines, including Hugging Face Diffusers and ComfyUI.

Capabilities and Use Cases

FLUX.1 Kontext [dev] is not a general-purpose text-to-image generator; it is a specialized tool for image-to-image manipulation. Its primary strength lies in its ability to maintain "character consistency," a notoriously difficult task for open-weight models.

Precise Local and Global Editing

The model excels at both surgical edits (changing a specific object within a frame) and global style transfers. Because it understands the spatial context of the input image, it can add, remove, or modify elements while ensuring the new pixels blend naturally with the existing lighting and perspective.

Iterative Workflows

Practitioners can use Kontext for multi-stage editing. For example, a developer can generate a base character, then use Kontext in subsequent passes to change the setting from a forest to a cityscape, then change the character's expression, all while keeping the character's face and proportions identical.

Commercial Prototyping

While the [dev] weights are under a non-commercial license, they serve as a perfect local sandbox for developers building apps that will eventually scale to the [pro] API. It allows for the development of complex ComfyUI workflows or custom inference scripts without incurring cloud API costs during the R&D phase.

Running FLUX.1 Kontext [dev] Locally

To run FLUX.1 Kontext [dev] locally, your primary bottleneck will be VRAM. The model weights alone are substantial, and the bidirectional attention mechanisms inherent in the transformer architecture require significant memory headroom during inference.

Hardware Requirements and VRAM

Minimum (FP8 Quantization): 16GB VRAM. This is the baseline for running the model on consumer GPUs like the RTX 4080 or the 16GB variant of the RTX 4060 Ti. At this level, you will likely need to use "weight offloading" to system RAM, which will slow down the generation process.
Recommended (BF16/Full Weights): 24GB - 32GB VRAM. An RTX 3090 or 4090 is the gold standard here. For Mac users, an M2/M3/M4 Max with at least 64GB of Unified Memory is recommended to ensure the model and the OS have enough breathing room.
Blackwell Optimization: If you are running on Blackwell-based hardware, using the FP4 TensorRT weights can drastically reduce VRAM requirements while maintaining near-native performance.

Performance and Quantization

For most practitioners, the FP8 scaled version is the best balance of quality and speed. The full BF16 weights (approx. 24GB) are often too large for a single consumer GPU once you account for the VRAM required by the UI and operating system.

Expected performance on an RTX 4090 using FP8 weights typically ranges from 5 to 8 seconds per iteration. A standard 20-30 step edit can be completed in under 40 seconds. On older hardware or with aggressive CPU offloading, expect significantly longer wait times, often exceeding 50 seconds per iteration.

Local Deployment Tools

ComfyUI: This is the recommended environment for Kontext. There is day-0 support via specific nodes (like FluxKontextModelScale) that handle the dual-input logic.
Hugging Face Diffusers: Ideal for developers integrating the model into custom Python applications.
TensorRT: Best for those with NVIDIA hardware looking for maximum throughput via the optimized BF16/FP8/FP4 variants.

How It Compares

FLUX.1 Kontext [dev] enters a market previously dominated by models like Bytedance’s Bagel or HiDream-E1.

FLUX.1 Kontext [dev] vs. HiDream-E1

HiDream-E1 has been a popular choice for multimodal editing, but practitioners often report issues with consistency and "hallucinated" artifacts during complex edits. Kontext [dev] generally demonstrates superior character preservation and a more sophisticated understanding of global lighting.

FLUX.1 Kontext [dev] vs. Gemini-Flash (Image)

While Gemini-Flash is a closed-source, API-only model, Kontext [dev] provides comparable—and in some benchmarks, superior—editing precision. The primary advantage of Kontext is the lack of "censorship" or "guardrail" interference that often plagues proprietary models, alongside the zero-latency benefit of local execution.

When choosing between Kontext and a standard FLUX.1 [dev] model with a ControlNet, Kontext is usually the better choice for pure editing. While ControlNets can guide generation, they often struggle with the "contextual" part of the edit—ensuring the new elements feel lived-in and stylistically matched to the original image. Kontext handles this natively within the 12B parameter transformer.

Related Models

Black Forest Labs

Explore the Provider

See all Black Forest Labs models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Black Forest Labs model we track.

Open Black Forest Labs

Explore the Family

See every FLUX release

The full FLUX family leaderboard with sizes, benchmark scores, and a release timeline.

Open FLUX

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Black Forest Labs

FLUX.1 Kontext [dev]

12B parameter rectified flow transformer optimized for iterative, context-aware editing. Dual image-text input with strong character consistency across scenes; TensorRT-optimized for Blackwell.

12B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source image to image workloads

A workable 12B-parameter dense image generator from Black Forest Labs. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters12B

ArchitectureDense

ProviderBlack Forest Labs

Download Size67.1 GB

Community

Monthly Downloads115.3K

Likes2.7K

Last Updated5 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

FLUX Non-Commercial License

Performance & Scoring

Benchmarks

Image Edit Arena

40.0

MBA Open Score

47.7CC

Benchmark45%

40.0

Popularity25%

61.9

Efficiency20%

43.8

Versatility10%

55.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	7.9 GB
Acer Veriton GN100 AI MiniAcer	SS	7.9 GB
AMD Instinct MI300XAMD	SS	7.9 GB
AMD Instinct MI325XAMD	SS	7.9 GB
AMD Instinct MI355XAMD	SS	7.9 GB
AMD Radeon RX 7800 XTAMD	SS	7.9 GB
AMD Radeon RX 7900 XTAMD	SS	7.9 GB
AMD Radeon RX 7900 XTXAMD	SS	7.9 GB
AMD Radeon RX 9070AMD	SS	7.9 GB
AMD Radeon RX 9070 XTAMD	SS	7.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	7.9 GB
Apple M4Apple	SS	7.9 GB
Apple M4 Max (40-core GPU)Apple	SS	7.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	7.9 GB
Apple M5Apple	SS	7.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	7.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	7.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	7.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	7.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	7.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	7.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	7.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	7.9 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	7.9 GB
Apple Mac Studio (M2 Max, 2023)Apple	SS	7.9 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 8 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 3080Vast.ai · Spot · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 3080Vast.ai · On-Demand · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5090Vast.ai · Spot · 32 GB VRAM	$0.11
NVIDIA GeForce RTX 5070 TiVast.ai · On-Demand · 16 GB VRAM	$0.12

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture and Technical Details

Capabilities and Use Cases

Precise Local and Global Editing

Iterative Workflows

Commercial Prototyping

Running FLUX.1 Kontext [dev] Locally

Hardware Requirements and VRAM

Minimum (FP8 Quantization): 16GB VRAM. This is the baseline for running the model on consumer GPUs like the RTX 4080 or the 16GB variant of the RTX 4060 Ti. At this level, you will likely need to use "weight offloading" to system RAM, which will slow down the generation process.
Recommended (BF16/Full Weights): 24GB - 32GB VRAM. An RTX 3090 or 4090 is the gold standard here. For Mac users, an M2/M3/M4 Max with at least 64GB of Unified Memory is recommended to ensure the model and the OS have enough breathing room.
Blackwell Optimization: If you are running on Blackwell-based hardware, using the FP4 TensorRT weights can drastically reduce VRAM requirements while maintaining near-native performance.

Performance and Quantization

Local Deployment Tools

ComfyUI: This is the recommended environment for Kontext. There is day-0 support via specific nodes (like FluxKontextModelScale) that handle the dual-input logic.
Hugging Face Diffusers: Ideal for developers integrating the model into custom Python applications.
TensorRT: Best for those with NVIDIA hardware looking for maximum throughput via the optimized BF16/FP8/FP4 variants.