Mistral AI

Mixtral 8x22B Instruct

Mistral's largest MoE model with 141B total / 39B active parameters. 8 experts, 64K context. Strong coding and multilingual performance. Apache 2.0 licensed.

141B paramsMoE66K ctx

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A workable 141B-parameter MoE language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onApple M4 Pro (14-core CPU, 20-core GPU)Cheapest card in our directory with comfortable headroom (64 GB) for this model at Q4 (~43.6 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters141B

Active Params39B

ArchitectureMoE

Context Length66K tokens

ModalityText Only

Training Cutoff2024

ProviderMistral AI

Download Size281.3 GB

Community

Monthly Downloads26.2K

Likes751

Last Updated10 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mixtral:8x22b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Arena Score

49.3

Overall Score

43.7CC

Benchmark40%

49.3

Popularity25%

26.7

Efficiency20%

42.6

Versatility15%

58.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	35.4 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	43.6 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	47.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	52.1 GB	Excellent	Near-lossless quality with manageable size
Q8_0	61.9 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	98.9 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	61.9 tok/s	43.6 GB
Google Cloud TPU v5pGoogle	SS	51.1 tok/s	43.6 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	45.3 tok/s	43.6 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	37.7 tok/s	43.6 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	68.4 tok/s	43.6 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	88.7 tok/s	43.6 GB
AMD Instinct MI300XAMD	SS	97.9 tok/s	43.6 GB
Google TPU v7 (Ironwood)Google	SS	136.4 tok/s	43.6 GB
NVIDIA B200 GPUNVIDIA	SS	147.8 tok/s	43.6 GB
AMD Instinct MI325XAMD	SS	110.9 tok/s	43.6 GB
AMD Instinct MI355XAMD	SS	147.8 tok/s	43.6 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	131.2 tok/s	43.6 GB
Dell Pro Max with GB300Dell	SS	131.2 tok/s	43.6 GB
HP ZGX Fury AI StationHP	SS	131.2 tok/s	43.6 GB
MSI XpertStation WS300MSI	SS	131.2 tok/s	43.6 GB
SuperMicro Super AI StationSuperMicro	SS	131.2 tok/s	43.6 GB
Gigabyte W775-V10-L01Gigabyte	SS	131.2 tok/s	43.6 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	14.8 tok/s	43.6 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	9.5 tok/s	43.6 GB
Apple Mac Studio (M1 Max, 2022)Apple	BB	7.4 tok/s	43.6 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	14.8 tok/s	43.6 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	11.3 tok/s	43.6 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	11.3 tok/s	43.6 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	11.3 tok/s	43.6 GB
Apple Mac Studio (M2 Max, 2023)Apple	BB	7.4 tok/s	43.6 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~4.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mixtral 8x22B Instruct on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~4.7 tok/s · 150W	$1.06
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 44 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Mixtral 8x22B Instruct represents the current ceiling for high-performance, open-weight models that can be run on professional-grade local hardware. Released by Mistral AI under the Apache 2.0 license, this model is a Sparse Mixture of Experts (SMoE) that scales the architecture of the highly successful Mixtral 8x7B to a massive 141B total parameters.

Unlike dense models of similar size, Mixtral 8x22B Instruct only utilizes 39B active parameters during inference. This architectural choice positions it as a direct competitor to Llama 3 70B and Command R+, offering a distinct advantage in reasoning density and multilingual capabilities. For practitioners, this model serves as a "Goldilocks" solution: it provides the performance of a 100B+ parameter model while maintaining the inference speed typically associated with much smaller dense architectures.

Architecture and Technical Specifications

The core of Mixtral 8x22B Instruct is its Mixture of Experts (MoE) design. It utilizes 8 distinct experts, with 2 experts being routed for every token. This results in a total parameter count of 141B, but an active parameter count of only 39B.

Mixtral 8x22B Instruct MoE Efficiency

The primary benefit of this architecture is Mixtral 8x22B Instruct MoE efficiency. In a standard dense model, every parameter is calculated for every token generated. In this SMoE setup, the model "selects" the most relevant experts for the task at hand. This means you get the knowledge and nuance of a 141B parameter model but the Mixtral 8x22B Instruct tokens per second (throughput) of a 39B model.

Total Parameters: 141B
Active Parameters: 39B
Context Window: 65,536 tokens (64K)
Attention Mechanism: Grouped Query Attention (GQA) for optimized memory usage during inference.
License: Apache 2.0 (Permissive for commercial use).

The 64K context length is a significant upgrade over earlier Mistral models, allowing for large-scale document analysis, extensive codebases to be loaded into memory, and complex multi-turn conversations without losing the thread of the dialogue.

Capabilities and Use Cases

Mixtral 8x22B Instruct is tuned specifically for instruction following and complex task execution. It is a text-only model that excels in environments where precision and logic are more important than creative prose.

Mixtral 8x22B Instruct for Coding

This model is a top-tier choice for local development environments. Its Mixtral 8x22B Instruct reasoning benchmark scores in coding tasks rival many proprietary models. It is particularly adept at:

Boilerplate Generation: Handling complex framework setups in Python, Rust, and TypeScript.
Refactoring: Analyzing large blocks of code within its 64K context window to suggest architectural improvements.
Debugging: Identifying logic flaws across multiple files.

Multilingual and Reasoning Performance

Mistral AI has optimized this model for native-level performance in English, French, Italian, German, and Spanish. Beyond simple translation, it understands cultural nuances and technical terminology across these languages. In terms of reasoning, the model's math and logic capabilities make it suitable for:

Function Calling: Reliable JSON output for agentic workflows and tool use.
Mathematical Problem Solving: Handling multi-step word problems and symbolic logic.
Structured Data Extraction: Turning messy, long-form text into structured formats for database ingestion.

Running Mixtral 8x22B Instruct Locally

Running a 141B parameter model locally is a significant hardware undertaking. The primary bottleneck is not compute power, but VRAM capacity. Because all 141B parameters must reside in memory—even if only 39B are active—you cannot treat this like a 40B dense model when calculating your hardware stack.

Mixtral 8x22B Instruct VRAM Requirements

To run Mixtral 8x22B Instruct locally, you must account for the weights and the KV cache.

FP16 (Unquantized): Requires ~280GB of VRAM. This is generally restricted to multi-H100 or A100 server clusters.
Q4_K_M (4-bit Quantization): The best quantization for Mixtral 8x22B Instruct for most practitioners. It requires ~85GB - 95GB of VRAM.
Q2_K (2-bit Quantization): Requires ~55GB - 60GB of VRAM. Expect significant "perplexity" (intelligence) loss at this level.

Mixtral 8x22B Instruct Hardware Requirements

To achieve usable performance, you need a high-bandwidth memory interface.

NVIDIA Setup: A minimum of 4x RTX 3090/4090 (24GB each) using NVLink or high-speed PCIe is required to run a 4-bit (Q4_K_M) version. A 2x A6000 (48GB each) setup is the professional entry point.
Apple Silicon: This is often the most cost-effective way to run this model. An M2 Ultra or M3/M4 Max with at least 128GB of Unified Memory is required. For the best experience, an M2/M3 Ultra with 192GB of RAM allows for the Q4 quantization plus a full 64K context window.
The "Consumer" Path: If you are trying to figure out how to run 141B model on consumer GPU hardware with only a single card, you cannot. You must either use heavy 2-bit quantization (not recommended) or offload layers to system RAM (GGUF format), which will result in speeds of <1 token per second.

Quick Start with Ollama

The fastest way to test this model is via Ollama. Once you have the necessary VRAM, run:

ollama run mixtral-8x22b

This will default to a 4-bit quantized version, which provides the best balance of speed and intelligence.

How It Compares

When evaluating Mixtral 8x22B Instruct against other models in the local AI model 141B parameters 2025 landscape, two primary competitors emerge: Llama 3 70B and Command R+.

Mixtral 8x22B Instruct vs Llama 3 70B

Llama 3 70B is a dense model, meaning it is significantly easier to fit onto hardware. You can run Llama 3 70B comfortably on 2x RTX 3090s. However, Mixtral 8x22B often outperforms Llama 3 70B in:

Multilingual tasks: Mistral’s European roots give it an edge in non-English languages.
Context length: Mixtral’s native 64K context is more stable for long-form retrieval than Llama 3 70B’s 8K (standard) context, though Llama has long-context variants.
Reasoning density: The MoE architecture allows Mixtral to "think" with more specialized experts for niche technical tasks.

Mixtral 8x22B Instruct vs Command R+

Command R+ (104B) is specifically optimized for RAG (Retrieval Augmented Generation) and tool use. While Command R+ is excellent for enterprise search tasks, Mixtral 8x22B Instruct is generally considered a better general-purpose model, particularly for coding and raw mathematical reasoning. Mixtral's Apache 2.0 license is also more permissive than the licenses often attached to Cohere's weights for commercial applications.

Choosing the Best GPU for Mixtral 8x22B Instruct

If your goal is maximum throughput, the best GPU for Mixtral 8x22B Instruct is the NVIDIA A100 (80GB) or H100, ideally in a pair. For local developers on a budget, a used Mac Studio with an M2 Ultra (192GB RAM) provides the most seamless experience for running the model with a large context window without the power draw and heat of a quad-GPU PC build.

Related Models

Mistral AI

Explore the Provider

See all Mistral AI models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Mistral AI model we track.

Open Mistral AI

Explore the Family

See every Mistral release

The full Mistral family leaderboard with sizes, benchmark scores, and a release timeline.

Open Mistral

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Mistral AI

Mixtral 8x22B Instruct

Mistral's largest MoE model with 141B total / 39B active parameters. 8 experts, 64K context. Strong coding and multilingual performance. Apache 2.0 licensed.

141B paramsMoE66K ctx

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A workable 141B-parameter MoE language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onApple M4 Pro (14-core CPU, 20-core GPU)Cheapest card in our directory with comfortable headroom (64 GB) for this model at Q4 (~43.6 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters141B

Active Params39B

ArchitectureMoE

Context Length66K tokens

ModalityText Only

Training Cutoff2024

ProviderMistral AI

Download Size281.3 GB

Community

Monthly Downloads26.2K

Likes751

Last Updated10 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mixtral:8x22b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Arena Score

49.3

Overall Score

43.7CC

Benchmark40%

49.3

Popularity25%

26.7

Efficiency20%

42.6

Versatility15%

58.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	35.4 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	43.6 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	47.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	52.1 GB	Excellent	Near-lossless quality with manageable size
Q8_0	61.9 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	98.9 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	61.9 tok/s	43.6 GB
Google Cloud TPU v5pGoogle	SS	51.1 tok/s	43.6 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	45.3 tok/s	43.6 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	37.7 tok/s	43.6 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	68.4 tok/s	43.6 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	88.7 tok/s	43.6 GB
AMD Instinct MI300XAMD	SS	97.9 tok/s	43.6 GB
Google TPU v7 (Ironwood)Google	SS	136.4 tok/s	43.6 GB
NVIDIA B200 GPUNVIDIA	SS	147.8 tok/s	43.6 GB
AMD Instinct MI325XAMD	SS	110.9 tok/s	43.6 GB
AMD Instinct MI355XAMD	SS	147.8 tok/s	43.6 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	131.2 tok/s	43.6 GB
Dell Pro Max with GB300Dell	SS	131.2 tok/s	43.6 GB
HP ZGX Fury AI StationHP	SS	131.2 tok/s	43.6 GB
MSI XpertStation WS300MSI	SS	131.2 tok/s	43.6 GB
SuperMicro Super AI StationSuperMicro	SS	131.2 tok/s	43.6 GB
Gigabyte W775-V10-L01Gigabyte	SS	131.2 tok/s	43.6 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	14.8 tok/s	43.6 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	9.5 tok/s	43.6 GB
Apple Mac Studio (M1 Max, 2022)Apple	BB	7.4 tok/s	43.6 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	14.8 tok/s	43.6 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	11.3 tok/s	43.6 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	11.3 tok/s	43.6 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	11.3 tok/s	43.6 GB
Apple Mac Studio (M2 Max, 2023)Apple	BB	7.4 tok/s	43.6 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~4.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mixtral 8x22B Instruct on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~4.7 tok/s · 150W	$1.06
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 44 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture and Technical Specifications

Mixtral 8x22B Instruct MoE Efficiency

Total Parameters: 141B
Active Parameters: 39B
Context Window: 65,536 tokens (64K)
Attention Mechanism: Grouped Query Attention (GQA) for optimized memory usage during inference.
License: Apache 2.0 (Permissive for commercial use).

Capabilities and Use Cases

Mixtral 8x22B Instruct for Coding

Boilerplate Generation: Handling complex framework setups in Python, Rust, and TypeScript.
Refactoring: Analyzing large blocks of code within its 64K context window to suggest architectural improvements.
Debugging: Identifying logic flaws across multiple files.

Multilingual and Reasoning Performance

Function Calling: Reliable JSON output for agentic workflows and tool use.
Mathematical Problem Solving: Handling multi-step word problems and symbolic logic.
Structured Data Extraction: Turning messy, long-form text into structured formats for database ingestion.

Running Mixtral 8x22B Instruct Locally

Mixtral 8x22B Instruct VRAM Requirements

To run Mixtral 8x22B Instruct locally, you must account for the weights and the KV cache.

FP16 (Unquantized): Requires ~280GB of VRAM. This is generally restricted to multi-H100 or A100 server clusters.
Q4_K_M (4-bit Quantization): The best quantization for Mixtral 8x22B Instruct for most practitioners. It requires ~85GB - 95GB of VRAM.
Q2_K (2-bit Quantization): Requires ~55GB - 60GB of VRAM. Expect significant "perplexity" (intelligence) loss at this level.

Mixtral 8x22B Instruct Hardware Requirements

To achieve usable performance, you need a high-bandwidth memory interface.

NVIDIA Setup: A minimum of 4x RTX 3090/4090 (24GB each) using NVLink or high-speed PCIe is required to run a 4-bit (Q4_K_M) version. A 2x A6000 (48GB each) setup is the professional entry point.
Apple Silicon: This is often the most cost-effective way to run this model. An M2 Ultra or M3/M4 Max with at least 128GB of Unified Memory is required. For the best experience, an M2/M3 Ultra with 192GB of RAM allows for the Q4 quantization plus a full 64K context window.
The "Consumer" Path: If you are trying to figure out how to run 141B model on consumer GPU hardware with only a single card, you cannot. You must either use heavy 2-bit quantization (not recommended) or offload layers to system RAM (GGUF format), which will result in speeds of <1 token per second.

Quick Start with Ollama

The fastest way to test this model is via Ollama. Once you have the necessary VRAM, run:

ollama run mixtral-8x22b

This will default to a 4-bit quantized version, which provides the best balance of speed and intelligence.

How It Compares

When evaluating Mixtral 8x22B Instruct against other models in the local AI model 141B parameters 2025 landscape, two primary competitors emerge: Llama 3 70B and Command R+.

Mixtral 8x22B Instruct vs Llama 3 70B

Llama 3 70B is a dense model, meaning it is significantly easier to fit onto hardware. You can run Llama 3 70B comfortably on 2x RTX 3090s. However, Mixtral 8x22B often outperforms Llama 3 70B in:

Multilingual tasks: Mistral’s European roots give it an edge in non-English languages.
Context length: Mixtral’s native 64K context is more stable for long-form retrieval than Llama 3 70B’s 8K (standard) context, though Llama has long-context variants.
Reasoning density: The MoE architecture allows Mixtral to "think" with more specialized experts for niche technical tasks.