Ideogram

Ideogram 4.0

Ideogram's first open-weight text-to-image model, released 2026-06-03. It is a 9.3B-parameter single-stream Diffusion Transformer with 34 layers, trained from scratch, that uses Qwen3-VL-8B-Instruct as its text encoder and generates at up to 2048 px per side. Ideogram reports a 0.97 X-Omni English OCR score, the strongest in-image text rendering among open-weight models at its size, alongside a 1062 designer-preference ELO that placed it first among open-weight models (second overall) on its launch leaderboard. Weights, inference code, and a prompting guide ship publicly under a non-commercial license, with a separate commercial license for production use.

9.3B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source text to image workloads

A solid 9.3B-parameter dense image generator from Ideogram. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing. Newly released, so production-readiness is still being shaken out.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters9.3B

ArchitectureDense

ProviderIdeogram

Download Size64.7 GB

Community

Monthly Downloads31.1K

Likes612

Last Updated23 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Ideogram 4 Non-CommercialView Full License

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

MBA Open Score

56.1BB

Benchmark45%

50.0

Popularity25%

48.8

Efficiency20%

64.7

Versatility10%

85.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	6.2 GB
Acer Veriton GN100 AI MiniAcer	SS	6.2 GB
AMD Instinct MI300XAMD	SS	6.2 GB
AMD Instinct MI325XAMD	SS	6.2 GB
AMD Instinct MI355XAMD	SS	6.2 GB
AMD Radeon RX 7700 XTAMD	SS	6.2 GB
AMD Radeon RX 7800 XTAMD	SS	6.2 GB
AMD Radeon RX 7900 XTAMD	SS	6.2 GB
AMD Radeon RX 7900 XTXAMD	SS	6.2 GB
AMD Radeon RX 9070AMD	SS	6.2 GB
AMD Radeon RX 9070 XTAMD	SS	6.2 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.2 GB
Apple M4Apple	SS	6.2 GB
Apple M4 Max (40-core GPU)Apple	SS	6.2 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.2 GB
Apple M5Apple	SS	6.2 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.2 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.2 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.2 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.2 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.2 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.2 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.2 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.2 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.2 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 6 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA L4Vast.ai · Spot · 24 GB VRAM	$0.03
NVIDIA L4Vast.ai · On-Demand · 24 GB VRAM	$0.04
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.09
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Ideogram 4.0 is Ideogram's first open-weight text-to-image model, released June 3, 2026. It is a 9.3B-parameter foundation model trained from scratch — not a fine-tune or distillation of any existing architecture. The model targets practitioners who need precise design output, particularly in-image text rendering, structured layout control, and multilingual typography.

The model immediately set a new bar for open-weight image generation. On its launch leaderboard, it scored a 0.97 X-Omni English OCR score — the strongest in-image text rendering among open-weight models at its size — and a 1062 designer-preference ELO that placed it first among open-weight models and second overall. The weights, inference code, and prompting guide ship under a non-commercial license, with a separate commercial license for production use.

Architecture & Technical Details

Ideogram 4.0 uses a single-stream Diffusion Transformer (DiT) with 34 layers. Text and image tokens share the same projections at every layer — a design choice that has become standard among recent open-weight image models, but with two key differentiators.

First, the text encoder is Qwen3-VL-8B-Instruct, a vision-language model used in text-only mode. The DiT consumes hidden states from 13 of its intermediate layers concatenated along the feature dimension, rather than a single final hidden state or no external encoder at all. This gives the model richer semantic understanding of prompts, particularly for complex descriptions and multilingual text.

Second, the model was trained exclusively on structured JSON captions with per-element styling, optional bounding boxes, and color palettes. The reference inference pipeline parses every prompt as JSON and validates it against the schema before generation. This means you can specify exact layout positions, text content, and color schemes in a structured format rather than hoping the model interprets natural language correctly.

The pipeline consists of four components:

Frozen text encoder: Qwen3-VL-8B-Instruct (text-only mode)
Trained backbone: 9.3B single-stream DiT with 34 layers, QK-RMSNorm, MRoPE
Sampler: Euler flow-matching with asymmetric classifier-free guidance, 12/20/48 denoising steps
Frozen decoder: Flux VAE (8× spatial compression, unpatch 2×2 latent tokens)

The model generates at up to 2048 px per side. Two quantizations are available at launch: nf4 (CUDA, supported in Diffusers) and fp8 (all hardware, no Diffusers support yet).

Capabilities & Use Cases

Ideogram 4.0 excels in three areas that have historically been weak points for open-weight image models:

In-image text rendering. The 0.97 X-Omni OCR score is not a benchmark artifact — the model renders English, Chinese, Japanese, Korean, and other scripts legibly within generated images. This is useful for poster mockups, signage, packaging design, UI screenshots, and any workflow where text must be readable, not decorative.

Structured layout control. The JSON prompting interface lets you specify bounding boxes for individual elements, define color palettes, and control per-element styling. This is valuable for design iteration where you need consistent placement across multiple generations — product shots with specific text placement, magazine layouts, or multi-element compositions.

Multilingual support. Because the text encoder is a vision-language model with multilingual training, Ideogram 4.0 handles non-English prompts and generates text in multiple scripts more reliably than models that rely on CLIP or T5 encoders alone.

Concrete use cases:

Marketing collateral with embedded copy (posters, flyers, social media graphics)
UI/UX mockups with readable interface text
Packaging design with product names and descriptions
Signage and wayfinding visualizations
Comic panels and storyboards with dialogue
Brand asset generation where typography is part of the design

Running Ideogram 4.0 Locally

Ideogram 4.0 is a 9.3B dense model, meaning all parameters are active during inference. This puts it in the same VRAM class as other 7B-13B image generation models, with the caveat that the full pipeline (encoder + DiT + VAE) increases total memory pressure.

Minimum hardware requirements:

nf4 quantization: 8-10 GB VRAM (RTX 3070, RTX 4060 Ti, M4 Pro with 16GB unified memory)
fp8 quantization: 12-16 GB VRAM (RTX 4070 Ti, RTX 4080, M4 Max with 24GB+)
fp16 (if available): 20+ GB VRAM (RTX 4090, A6000, dual-GPU setups)

Recommended hardware:

Best consumer GPU: RTX 4090 (24GB) — runs nf4 comfortably with room for batch processing or higher resolution
Best Apple Silicon: M4 Max with 48GB unified memory — handles fp8 quantization well
Best value: RTX 4070 Ti Super (16GB) — runs nf4 with good generation speed

Expected performance (nf4 on RTX 4090):

1024×1024 at 20 steps: ~3-5 seconds per image
2048×2048 at 20 steps: ~12-18 seconds per image
Throughput depends on denoising steps (12, 20, or 48)

Quantization recommendations:

nf4: Best for most users. Supported in Diffusers. Minimal quality loss vs fp16. Requires CUDA.
fp8: Runs on any hardware but lacks Diffusers support. Slightly better quality than nf4 at the cost of higher VRAM and no library integration yet.

The quickest way to start is via the official GitHub repository (ideogram-oss/ideogram4), which ships inference code and a prompting guide. Diffusers support for nf4 means you can integrate it into existing pipelines with minimal friction.

How It Compares

vs. FLUX.1-dev (12B): FLUX produces higher photorealism and aesthetic quality in natural scenes, but Ideogram 4.0 wins on text rendering and structured layout control by a wide margin. FLUX has no native JSON prompting or bounding-box support. If your work is photography-adjacent, FLUX is stronger. If you need design output with embedded text, Ideogram 4.0 is the better choice.

vs. SD3.5 (8B): SD3.5 has a larger ecosystem (LoRAs, ControlNets, community tools) and runs on more hardware configurations. Ideogram 4.0 produces higher-resolution output natively (2048px vs 1024px) and has superior text rendering. SD3.5 is the safer bet for general-purpose generation; Ideogram 4.0 is specialized for design work.

The tradeoff is license and ecosystem maturity. Ideogram 4.0 uses a non-commercial license by default, with a separate commercial license for production use. It also lacks the community-trained LoRAs and fine-tunes that have accumulated around SD3.5 and FLUX. For practitioners who need design-specific capabilities and are willing to work within the licensing terms, Ideogram 4.0 is currently the strongest open-weight option in its class.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

9.3B

Ideogram

Ideogram 4.0

9.3B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source text to image workloads

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters9.3B

ArchitectureDense

ProviderIdeogram

Download Size64.7 GB

Community

Monthly Downloads31.1K

Likes612

Last Updated23 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Ideogram 4 Non-CommercialView Full License

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

MBA Open Score

56.1BB

Benchmark45%

50.0

Popularity25%

48.8

Efficiency20%

64.7

Versatility10%

85.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	6.2 GB
Acer Veriton GN100 AI MiniAcer	SS	6.2 GB
AMD Instinct MI300XAMD	SS	6.2 GB
AMD Instinct MI325XAMD	SS	6.2 GB
AMD Instinct MI355XAMD	SS	6.2 GB
AMD Radeon RX 7700 XTAMD	SS	6.2 GB
AMD Radeon RX 7800 XTAMD	SS	6.2 GB
AMD Radeon RX 7900 XTAMD	SS	6.2 GB
AMD Radeon RX 7900 XTXAMD	SS	6.2 GB
AMD Radeon RX 9070AMD	SS	6.2 GB
AMD Radeon RX 9070 XTAMD	SS	6.2 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	6.2 GB
Apple M4Apple	SS	6.2 GB
Apple M4 Max (40-core GPU)Apple	SS	6.2 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	6.2 GB
Apple M5Apple	SS	6.2 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	6.2 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	6.2 GB
Apple Mac Mini (M1, 2020)Apple	SS	6.2 GB
Apple Mac Mini (M2, 2023)Apple	SS	6.2 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	6.2 GB
Apple Mac Mini (M4, 2024)Apple	SS	6.2 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	6.2 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	6.2 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	6.2 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 6 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA L4Vast.ai · Spot · 24 GB VRAM	$0.03
NVIDIA L4Vast.ai · On-Demand · 24 GB VRAM	$0.04
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.09
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

The pipeline consists of four components:

Frozen text encoder: Qwen3-VL-8B-Instruct (text-only mode)
Trained backbone: 9.3B single-stream DiT with 34 layers, QK-RMSNorm, MRoPE
Sampler: Euler flow-matching with asymmetric classifier-free guidance, 12/20/48 denoising steps
Frozen decoder: Flux VAE (8× spatial compression, unpatch 2×2 latent tokens)

The model generates at up to 2048 px per side. Two quantizations are available at launch: nf4 (CUDA, supported in Diffusers) and fp8 (all hardware, no Diffusers support yet).

Capabilities & Use Cases

Ideogram 4.0 excels in three areas that have historically been weak points for open-weight image models:

Concrete use cases:

Marketing collateral with embedded copy (posters, flyers, social media graphics)
UI/UX mockups with readable interface text
Packaging design with product names and descriptions
Signage and wayfinding visualizations
Comic panels and storyboards with dialogue
Brand asset generation where typography is part of the design

Running Ideogram 4.0 Locally

Minimum hardware requirements:

nf4 quantization: 8-10 GB VRAM (RTX 3070, RTX 4060 Ti, M4 Pro with 16GB unified memory)
fp8 quantization: 12-16 GB VRAM (RTX 4070 Ti, RTX 4080, M4 Max with 24GB+)
fp16 (if available): 20+ GB VRAM (RTX 4090, A6000, dual-GPU setups)

Recommended hardware:

Best consumer GPU: RTX 4090 (24GB) — runs nf4 comfortably with room for batch processing or higher resolution
Best Apple Silicon: M4 Max with 48GB unified memory — handles fp8 quantization well
Best value: RTX 4070 Ti Super (16GB) — runs nf4 with good generation speed

Expected performance (nf4 on RTX 4090):

1024×1024 at 20 steps: ~3-5 seconds per image
2048×2048 at 20 steps: ~12-18 seconds per image
Throughput depends on denoising steps (12, 20, or 48)

Quantization recommendations:

nf4: Best for most users. Supported in Diffusers. Minimal quality loss vs fp16. Requires CUDA.
fp8: Runs on any hardware but lacks Diffusers support. Slightly better quality than nf4 at the cost of higher VRAM and no library integration yet.

How It Compares

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.