made by agents

Mistral's largest MoE model with 141B total / 39B active parameters. 8 experts, 64K context. Strong coding and multilingual performance. Apache 2.0 licensed.
Copy and paste this command to start running the model locally.
ollama run mixtral:8x22bAccess model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 35.4 GB | Low | Aggressive quantization — smallest size, noticeable quality loss |
| Q4_K_MRecommended | 43.6 GB | Good | Best balance of size and quality for most use-cases |
| Q5_K_M | 47.5 GB | Very Good | Slightly better quality than Q4 with moderate size increase |
| Q6_K | 52.1 GB | Excellent | Near-lossless quality with manageable size |
| Q8_0 | 61.9 GB | Near Perfect | Virtually indistinguishable from full precision |
| FP16 | 98.9 GB | Full | Full 16-bit floating point — maximum quality, largest size |
See which devices can run this model and at what quality level.
NVIDIA H100 SXM5 80GBNVIDIA | SS | 61.9 tok/s | 43.6 GB | |
Google Cloud TPU v5pGoogle | SS | 51.1 tok/s | 43.6 GB | |
| SS | 45.3 tok/s | 43.6 GB | ||
NVIDIA A100 SXM4 80GBNVIDIA | SS | 37.7 tok/s | 43.6 GB | |
| SS | 68.4 tok/s | 43.6 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 88.7 tok/s | 43.6 GB | |
| SS | 97.9 tok/s | 43.6 GB | ||
NVIDIA B200 GPUNVIDIA | SS | 147.8 tok/s | 43.6 GB | |
| SS | 110.9 tok/s | 43.6 GB | ||
| SS | 147.8 tok/s | 43.6 GB | ||
| AA | 14.8 tok/s | 43.6 GB | ||
| BB | 7.4 tok/s | 43.6 GB | ||
| BB | 14.8 tok/s | 43.6 GB | ||
| BB | 11.3 tok/s | 43.6 GB | ||
| BB | 11.3 tok/s | 43.6 GB | ||
| BB | 11.3 tok/s | 43.6 GB | ||
| BB | 7.4 tok/s | 43.6 GB | ||
| BB | 10.1 tok/s | 43.6 GB | ||
| BB | 5.7 tok/s | 43.6 GB | ||
| BB | 10.1 tok/s | 43.6 GB | ||
| BB | 10.1 tok/s | 43.6 GB | ||
| BB | 10.1 tok/s | 43.6 GB | ||
| BB | 5.0 tok/s | 43.6 GB | ||
| BB | 5.0 tok/s | 43.6 GB | ||
| BB | 15.1 tok/s | 43.6 GB |
Mixtral 8x22B Instruct represents the current ceiling for high-performance, open-weight models that can be run on professional-grade local hardware. Released by Mistral AI under the Apache 2.0 license, this model is a Sparse Mixture of Experts (SMoE) that scales the architecture of the highly successful Mixtral 8x7B to a massive 141B total parameters.
Unlike dense models of similar size, Mixtral 8x22B Instruct only utilizes 39B active parameters during inference. This architectural choice positions it as a direct competitor to Llama 3 70B and Command R+, offering a distinct advantage in reasoning density and multilingual capabilities. For practitioners, this model serves as a "Goldilocks" solution: it provides the performance of a 100B+ parameter model while maintaining the inference speed typically associated with much smaller dense architectures.
The core of Mixtral 8x22B Instruct is its Mixture of Experts (MoE) design. It utilizes 8 distinct experts, with 2 experts being routed for every token. This results in a total parameter count of 141B, but an active parameter count of only 39B.
The primary benefit of this architecture is Mixtral 8x22B Instruct MoE efficiency. In a standard dense model, every parameter is calculated for every token generated. In this SMoE setup, the model "selects" the most relevant experts for the task at hand. This means you get the knowledge and nuance of a 141B parameter model but the Mixtral 8x22B Instruct tokens per second (throughput) of a 39B model.
The 64K context length is a significant upgrade over earlier Mistral models, allowing for large-scale document analysis, extensive codebases to be loaded into memory, and complex multi-turn conversations without losing the thread of the dialogue.
Mixtral 8x22B Instruct is tuned specifically for instruction following and complex task execution. It is a text-only model that excels in environments where precision and logic are more important than creative prose.
This model is a top-tier choice for local development environments. Its Mixtral 8x22B Instruct reasoning benchmark scores in coding tasks rival many proprietary models. It is particularly adept at:
Mistral AI has optimized this model for native-level performance in English, French, Italian, German, and Spanish. Beyond simple translation, it understands cultural nuances and technical terminology across these languages. In terms of reasoning, the model's math and logic capabilities make it suitable for:
Running a 141B parameter model locally is a significant hardware undertaking. The primary bottleneck is not compute power, but VRAM capacity. Because all 141B parameters must reside in memory—even if only 39B are active—you cannot treat this like a 40B dense model when calculating your hardware stack.
To run Mixtral 8x22B Instruct locally, you must account for the weights and the KV cache.
To achieve usable performance, you need a high-bandwidth memory interface.
The fastest way to test this model is via Ollama. Once you have the necessary VRAM, run:
ollama run mixtral-8x22b
This will default to a 4-bit quantized version, which provides the best balance of speed and intelligence.
When evaluating Mixtral 8x22B Instruct against other models in the local AI model 141B parameters 2025 landscape, two primary competitors emerge: Llama 3 70B and Command R+.
Llama 3 70B is a dense model, meaning it is significantly easier to fit onto hardware. You can run Llama 3 70B comfortably on 2x RTX 3090s. However, Mixtral 8x22B often outperforms Llama 3 70B in:
Command R+ (104B) is specifically optimized for RAG (Retrieval Augmented Generation) and tool use. While Command R+ is excellent for enterprise search tasks, Mixtral 8x22B Instruct is generally considered a better general-purpose model, particularly for coding and raw mathematical reasoning. Mixtral's Apache 2.0 license is also more permissive than the licenses often attached to Cohere's weights for commercial applications.
If your goal is maximum throughput, the best GPU for Mixtral 8x22B Instruct is the NVIDIA A100 (80GB) or H100, ideally in a pair. For local developers on a budget, a used Mac Studio with an M2 Ultra (192GB RAM) provides the most seamless experience for running the model with a large context window without the power draw and heat of a quad-GPU PC build.