
Mistral AI's state-of-the-art MoE model with 675B total / 41B active parameters. Multimodal with vision. Designed for production-grade assistants and enterprise workflows.
A solid 675B-parameter MoE language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Copy and paste this command to start running the model locally.
ollama run mistral-large-3:675b-cloudAccess model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 57.7 GB | Low | |
| Q4_K_MRecommended | 66.3 GB | Good | |
| Q5_K_M | 70.4 GB | Very Good | |
| Q6_K | 75.3 GB | Excellent | |
| Q8_0 | 85.5 GB | Near Perfect | |
| FP16 | 124.5 GB | Full |
See which devices can run this model and at what quality level.
| SS | 44.9 tok/s | 66.3 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 58.3 tok/s | 66.3 GB | |
| SS | 64.4 tok/s | 66.3 GB | ||
Google TPU v7 (Ironwood)Google | SS | 89.7 tok/s | 66.3 GB | |
NVIDIA B200 GPUNVIDIA | SS | 97.2 tok/s | 66.3 GB | |
Google Cloud TPU v5pGoogle | SS | 33.6 tok/s | 66.3 GB | |
| SS | 72.9 tok/s | 66.3 GB | ||
| SS | 97.2 tok/s | 66.3 GB | ||
| SS | 29.8 tok/s | 66.3 GB | ||
NVIDIA H100 SXM5 80GBNVIDIA | SS | 40.7 tok/s | 66.3 GB | |
| SS | 86.3 tok/s | 66.3 GB | ||
| SS | 86.3 tok/s | 66.3 GB | ||
Gigabyte W775-V10-L01Gigabyte | SS | 86.3 tok/s | 66.3 GB | |
| SS | 86.3 tok/s | 66.3 GB | ||
| SS | 86.3 tok/s | 66.3 GB | ||
SuperMicro Super AI StationSuperMicro | SS | 86.3 tok/s | 66.3 GB | |
NVIDIA A100 SXM4 80GBNVIDIA | AA | 24.8 tok/s | 66.3 GB | |
| AA | 9.7 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 7.5 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB | ||
| BB | 6.6 tok/s | 66.3 GB |
Energy cost on Apple M4 Pro (14-core CPU, 20-core GPU) (~3.3 tok/s, Q4_K_M) vs flagship API pricing.
| Source | Cost per 1M tokens |
|---|---|
Local (energy only)Mistral Large 3 675B on Apple M4 Pro (14-core CPU, 20-core GPU) · ~3.3 tok/s · 60W | $0.603 |
GPT-5.5OpenAI · in $5.00 · out $30.00 | $12.50 |
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00 | $11.00 |
Gemini 3.1 Flash Lite PreviewGoogle · in $0.250 · out $1.50 | $0.625 |
Grok 4.3 betaxAI · in $3.00 · out $15.00 | $6.60 |
API prices blended at 70% input / 30% output.
Hardware amortisation not included. Run the full ROI calculator for payback math.
Mistral Large 3 675B represents Mistral AI’s most ambitious release to date, positioning itself as a premier open-weight alternative to closed-source frontier models like GPT-4o and Claude 3.5 Sonnet. As a multimodal Mixture of Experts (MoE) model, it integrates vision and text capabilities into a massive 675B parameter framework. Despite the high total parameter count, the MoE architecture ensures that only 41B parameters are active during any single inference pass, striking a balance between high-reasoning capacity and computational efficiency.
For developers and engineers looking to run Mistral Large 3 675B locally, this model serves as a production-grade backbone for complex agentic workflows, advanced coding tasks, and document analysis. Unlike its predecessors, Mistral Large 3 is natively multimodal, allowing for sophisticated image reasoning and OCR-heavy tasks alongside its industry-leading text performance. It is licensed under the Mistral Research License, making it accessible for deep evaluation and non-commercial local deployment.
The Mistral Large 3 675B MoE efficiency is the model's defining technical characteristic. By utilizing a "Sparse Mixture of Experts" approach, the model maintains a massive knowledge base (675B parameters) while only activating a fraction (41B parameters) for each token generated. This means that while the VRAM requirements remain high to store the full model weights, the
The 128k context length allows for extensive "needle-in-a-haystack" retrieval, making it suitable for analyzing entire codebases or long legal documents. Because it was trained with a 2025 cutoff, it possesses a more current world knowledge base than many competing models, reducing the reliance on RAG (Retrieval-Augmented Generation) for relatively recent events.
Mistral Large 3 675B excels in high-stakes environments where reasoning and instruction-following are non-negotiable. It is not a general-purpose "chat" toy; it is a tool for engineering and enterprise logic.
When evaluating Mistral Large 3 675B for coding, the model demonstrates a sophisticated understanding of system architecture and multi-file refactoring. It handles Python, Rust, C++, and TypeScript with high proficiency. Its Mistral Large 3 675B reasoning benchmark scores place it at the top of the open-weight category, particularly in complex mathematical proofing and logical deduction.
The native vision capability enables:
The model is optimized for tool-use and function-calling, allowing it to act as the "brain" for local agents that need to interact with APIs, databases, or local file systems. Furthermore, its multilingual training covers dozens of languages, including French, German, Spanish, Italian, Chinese, and Japanese, with native-level nuance.
The primary challenge for practitioners is the Mistral Large 3 675B hardware requirements. At 675B parameters, this model is massive and requires significant VRAM even when using aggressive quantization.
To determine the best GPU for Mistral Large 3 675B, you must first decide on your quantization level. Running this model in FP16 is impractical for almost all local setups (requiring ~1.3TB of VRAM).
How to run 675B model on consumer GPUs? You cannot run this on a single RTX 4090. To run a Q4 quantization locally, you generally need:
The quickest way to deploy is via Ollama. Once you have the necessary VRAM, you can run:
ollama run mistral-large:675b
(Note: Ensure you are using a version of Ollama that supports the 2025 Mistral MoE architecture updates).
When choosing a local AI model 675B parameters 2025, Mistral Large 3 675B is often compared against Llama 3.1 405B and DeepSeek-V3.
For practitioners, the choice to run Mistral Large 3 675B locally usually comes down to the need for a 2025 training cutoff and superior vision capabilities integrated into a single, high-reasoning MoE framework.
