made by agents

Mistral AI's state-of-the-art MoE model with 675B total / 41B active parameters. Multimodal with vision. Designed for production-grade assistants and enterprise workflows.
Access model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 57.7 GB | Low | Aggressive quantization — smallest size, noticeable quality loss |
| Q4_K_MRecommended | 66.3 GB | Good | Best balance of size and quality for most use-cases |
| Q5_K_M | 70.4 GB | Very Good | Slightly better quality than Q4 with moderate size increase |
| Q6_K | 75.3 GB | Excellent | Near-lossless quality with manageable size |
| Q8_0 | 85.5 GB | Near Perfect | Virtually indistinguishable from full precision |
| FP16 | 124.5 GB | Full | Full 16-bit floating point — maximum quality, largest size |
See which devices can run this model and at what quality level.
| SS | 44.9 tok/s | 66.3 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 58.3 tok/s | 66.3 GB | |
| SS | 64.4 tok/s | 66.3 GB | ||
NVIDIA B200 GPUNVIDIA | SS | 97.2 tok/s | 66.3 GB | |
Google Cloud TPU v5pGoogle | SS | 33.6 tok/s | 66.3 GB | |
| SS | 72.9 tok/s | 66.3 GB | ||
| SS | 97.2 tok/s | 66.3 GB | ||
| SS | 29.8 tok/s | 66.3 GB | ||
NVIDIA H100 SXM5 80GBNVIDIA | SS | 40.7 tok/s | 66.3 GB | |
NVIDIA A100 SXM4 80GBNVIDIA | AA | 24.8 tok/s | 66.3 GB | |
| AA | 9.7 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 9.7 tok/s | 66.3 GB | ||
| BB | 4.9 tok/s | 66.3 GB | ||
| BB | 3.3 tok/s | 66.3 GB | ||
| BB | 9.9 tok/s | 66.3 GB | ||
| BB | 9.9 tok/s | 66.3 GB | ||
| DD | 4.9 tok/s | 66.3 GB | ||
| DD | 3.7 tok/s | 66.3 GB |
Mistral Large 3 675B represents Mistral AI’s most ambitious release to date, positioning itself as a premier open-weight alternative to closed-source frontier models like GPT-4o and Claude 3.5 Sonnet. As a multimodal Mixture of Experts (MoE) model, it integrates vision and text capabilities into a massive 675B parameter framework. Despite the high total parameter count, the MoE architecture ensures that only 41B parameters are active during any single inference pass, striking a balance between high-reasoning capacity and computational efficiency.
For developers and engineers looking to run Mistral Large 3 675B locally, this model serves as a production-grade backbone for complex agentic workflows, advanced coding tasks, and document analysis. Unlike its predecessors, Mistral Large 3 is natively multimodal, allowing for sophisticated image reasoning and OCR-heavy tasks alongside its industry-leading text performance. It is licensed under the Mistral Research License, making it accessible for deep evaluation and non-commercial local deployment.
The Mistral Large 3 675B MoE efficiency is the model's defining technical characteristic. By utilizing a "Sparse Mixture of Experts" approach, the model maintains a massive knowledge base (675B parameters) while only activating a fraction (41B parameters) for each token generated. This means that while the VRAM requirements remain high to store the full model weights, the Mistral Large 3 675B tokens per second (TPS) performance is significantly faster than a hypothetical 675B dense model.
The 128k context length allows for extensive "needle-in-a-haystack" retrieval, making it suitable for analyzing entire codebases or long legal documents. Because it was trained with a 2025 cutoff, it possesses a more current world knowledge base than many competing models, reducing the reliance on RAG (Retrieval-Augmented Generation) for relatively recent events.
Mistral Large 3 675B excels in high-stakes environments where reasoning and instruction-following are non-negotiable. It is not a general-purpose "chat" toy; it is a tool for engineering and enterprise logic.
When evaluating Mistral Large 3 675B for coding, the model demonstrates a sophisticated understanding of system architecture and multi-file refactoring. It handles Python, Rust, C++, and TypeScript with high proficiency. Its Mistral Large 3 675B reasoning benchmark scores place it at the top of the open-weight category, particularly in complex mathematical proofing and logical deduction.
The native vision capability enables:
The model is optimized for tool-use and function-calling, allowing it to act as the "brain" for local agents that need to interact with APIs, databases, or local file systems. Furthermore, its multilingual training covers dozens of languages, including French, German, Spanish, Italian, Chinese, and Japanese, with native-level nuance.
The primary challenge for practitioners is the Mistral Large 3 675B hardware requirements. At 675B parameters, this model is massive and requires significant VRAM even when using aggressive quantization.
To determine the best GPU for Mistral Large 3 675B, you must first decide on your quantization level. Running this model in FP16 is impractical for almost all local setups (requiring ~1.3TB of VRAM).
How to run 675B model on consumer GPUs? You cannot run this on a single RTX 4090. To run a Q4 quantization locally, you generally need:
The quickest way to deploy is via Ollama. Once you have the necessary VRAM, you can run:
ollama run mistral-large:675b
(Note: Ensure you are using a version of Ollama that supports the 2025 Mistral MoE architecture updates).
When choosing a local AI model 675B parameters 2025, Mistral Large 3 675B is often compared against Llama 3.1 405B and DeepSeek-V3.
For practitioners, the choice to run Mistral Large 3 675B locally usually comes down to the need for a 2025 training cutoff and superior vision capabilities integrated into a single, high-reasoning MoE framework.