Mistral AI

Mistral Large 3 675B

Mistral AI's state-of-the-art MoE model with 675B total / 41B active parameters. Multimodal with vision. Designed for production-grade assistants and enterprise workflows.

675B paramsMoE128K ctxText + Vision

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A solid 675B-parameter MoE language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onNVIDIA A100 SXM4 80GBCheapest card in our directory with comfortable headroom (80 GB) for this model at Q4 (~66.3 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters675B

Active Params41B

ArchitectureMoE

Context Length128K tokens

ModalityText + Vision

Training Cutoff2025

ProviderMistral AI

Download Size681.5 GB

Community

Monthly Downloads1.6K

Likes228

Last Updated5 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mistral-large-3:675b-cloud

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Mistral Research LicenseView Full License

Performance & Scoring

Benchmarks

Arena Score

89.5

Overall Score

61.7BB

Benchmark40%

89.5

Popularity25%

15.4

Efficiency20%

50.8

Versatility15%

79.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	57.7 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	66.3 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	70.4 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	75.3 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	124.5 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	44.9 tok/s	66.3 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	58.3 tok/s	66.3 GB
AMD Instinct MI300XAMD	SS	64.4 tok/s	66.3 GB
Google TPU v7 (Ironwood)Google	SS	89.7 tok/s	66.3 GB
NVIDIA B200 GPUNVIDIA	SS	97.2 tok/s	66.3 GB
Google Cloud TPU v5pGoogle	SS	33.6 tok/s	66.3 GB
AMD Instinct MI325XAMD	SS	72.9 tok/s	66.3 GB
AMD Instinct MI355XAMD	SS	97.2 tok/s	66.3 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	29.8 tok/s	66.3 GB
NVIDIA H100 SXM5 80GBNVIDIA	SS	40.7 tok/s	66.3 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	86.3 tok/s	66.3 GB
Dell Pro Max with GB300Dell	SS	86.3 tok/s	66.3 GB
Gigabyte W775-V10-L01Gigabyte	SS	86.3 tok/s	66.3 GB
HP ZGX Fury AI StationHP	SS	86.3 tok/s	66.3 GB
MSI XpertStation WS300MSI	SS	86.3 tok/s	66.3 GB
SuperMicro Super AI StationSuperMicro	SS	86.3 tok/s	66.3 GB
NVIDIA A100 SXM4 80GBNVIDIA	AA	24.8 tok/s	66.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	9.7 tok/s	66.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	7.5 tok/s	66.3 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	7.5 tok/s	66.3 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	7.5 tok/s	66.3 GB
Apple M4 Max (40-core GPU)Apple	BB	6.6 tok/s	66.3 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.6 tok/s	66.3 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.6 tok/s	66.3 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.6 tok/s	66.3 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Apple M4 Pro (14-core CPU, 20-core GPU) (~3.3 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mistral Large 3 675B on Apple M4 Pro (14-core CPU, 20-core GPU) · ~3.3 tok/s · 60W	$0.603
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 66 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA A100 80GB PCIeVast.ai · Spot · 80 GB VRAM	$1.02

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Mistral Large 3 675B represents Mistral AI’s most ambitious release to date, positioning itself as a premier open-weight alternative to closed-source frontier models like GPT-4o and Claude 3.5 Sonnet. As a multimodal Mixture of Experts (MoE) model, it integrates vision and text capabilities into a massive 675B parameter framework. Despite the high total parameter count, the MoE architecture ensures that only 41B parameters are active during any single inference pass, striking a balance between high-reasoning capacity and computational efficiency.

For developers and engineers looking to run Mistral Large 3 675B locally, this model serves as a production-grade backbone for complex agentic workflows, advanced coding tasks, and document analysis. Unlike its predecessors, Mistral Large 3 is natively multimodal, allowing for sophisticated image reasoning and OCR-heavy tasks alongside its industry-leading text performance. It is licensed under the Mistral Research License, making it accessible for deep evaluation and non-commercial local deployment.

Architecture and Technical Details

The Mistral Large 3 675B MoE efficiency is the model's defining technical characteristic. By utilizing a "Sparse Mixture of Experts" approach, the model maintains a massive knowledge base (675B parameters) while only activating a fraction (41B parameters) for each token generated. This means that while the VRAM requirements remain high to store the full model weights, the Mistral Large 3 675B tokens per second (TPS) performance is significantly faster than a hypothetical 675B dense model.

Key Technical Specifications:

Total Parameters: 675B
Active Parameters: 41B per token
Context Window: 128,000 tokens
Modality: Multimodal (Text + Vision)
Training Cutoff: 2025
Architecture: Sparse MoE with native vision encoders

The 128k context length allows for extensive "needle-in-a-haystack" retrieval, making it suitable for analyzing entire codebases or long legal documents. Because it was trained with a 2025 cutoff, it possesses a more current world knowledge base than many competing models, reducing the reliance on RAG (Retrieval-Augmented Generation) for relatively recent events.

Capabilities and Use Cases

Mistral Large 3 675B excels in high-stakes environments where reasoning and instruction-following are non-negotiable. It is not a general-purpose "chat" toy; it is a tool for engineering and enterprise logic.

Advanced Reasoning and Coding

When evaluating Mistral Large 3 675B for coding, the model demonstrates a sophisticated understanding of system architecture and multi-file refactoring. It handles Python, Rust, C++, and TypeScript with high proficiency. Its Mistral Large 3 675B reasoning benchmark scores place it at the top of the open-weight category, particularly in complex mathematical proofing and logical deduction.

Multimodal Workflows

The native vision capability enables:

Technical Diagram Interpretation: Converting whiteboard architecture sketches into Mermaid.js or Terraform code.
Document Processing: Extracting structured JSON from complex, non-standardized invoices or medical forms.
UI/UX Auditing: Reviewing frontend screenshots for accessibility or design-to-code alignment.

Function Calling and Multilingual Support

The model is optimized for tool-use and function-calling, allowing it to act as the "brain" for local agents that need to interact with APIs, databases, or local file systems. Furthermore, its multilingual training covers dozens of languages, including French, German, Spanish, Italian, Chinese, and Japanese, with native-level nuance.

Running Mistral Large 3 675B Locally

The primary challenge for practitioners is the Mistral Large 3 675B hardware requirements. At 675B parameters, this model is massive and requires significant VRAM even when using aggressive quantization.

Mistral Large 3 675B VRAM Requirements

To determine the best GPU for Mistral Large 3 675B, you must first decide on your quantization level. Running this model in FP16 is impractical for almost all local setups (requiring ~1.3TB of VRAM).

4-bit Quantization (Q4_K_M): Requires ~380GB - 400GB of VRAM. This is the best quantization for Mistral Large 3 675B, offering the best balance between perplexity loss and memory footprint.
2-bit Quantization (IQ2_XS): Requires ~180GB - 200GB of VRAM. This is the bare minimum for "functional" use, though some intelligence degradation occurs.

Recommended Hardware Configurations

How to run 675B model on consumer GPUs? You cannot run this on a single RTX 4090. To run a Q4 quantization locally, you generally need:

Multi-GPU Workstation: 8x NVIDIA RTX 6000 Ada (48GB each) or 16x RTX 4090 (24GB each) via PCIe expansion/NVLink.
Mac Studio/Mac Pro: An M2 Ultra or M4 Max with 192GB of Unified Memory can run a 2-bit or 3-bit version, but for 4-bit, you would need a cluster or a specialized high-memory Mac configuration if available.
Server Grade: A single node of 8x H100 or A100 (80GB) is the standard production environment for this model.

Getting Started with Ollama

The quickest way to deploy is via Ollama. Once you have the necessary VRAM, you can run:

ollama run mistral-large:675b

(Note: Ensure you are using a version of Ollama that supports the 2025 Mistral MoE architecture updates).

How It Compares

When choosing a local AI model 675B parameters 2025, Mistral Large 3 675B is often compared against Llama 3.1 405B and DeepSeek-V3.

Mistral Large 3 675B vs Llama 3.1 405B: While Llama 3.1 405B is a dense model, Mistral's MoE architecture often results in faster inference (tokens per second) once the model is loaded into memory. Mistral also generally leads in multilingual tasks and specific European-language nuances.
Mistral Large 3 675B vs DeepSeek-V3: DeepSeek-V3 is another MoE powerhouse. Mistral Large 3 typically offers superior performance in "out-of-the-box" instruction following and vision-integrated tasks, whereas DeepSeek may offer higher performance in pure coding/math at the cost of more complex deployment requirements.

For practitioners, the choice to run Mistral Large 3 675B locally usually comes down to the need for a 2025 training cutoff and superior vision capabilities integrated into a single, high-reasoning MoE framework.

Related Models

Mistral AI

Explore the Provider

See all Mistral AI models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Mistral AI model we track.

Open Mistral AI

Explore the Family

See every Mistral release

The full Mistral family leaderboard with sizes, benchmark scores, and a release timeline.

Open Mistral

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Mistral AI

Mistral Large 3 675B

Mistral AI's state-of-the-art MoE model with 675B total / 41B active parameters. Multimodal with vision. Designed for production-grade assistants and enterprise workflows.

675B paramsMoE128K ctxText + Vision

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A solid 675B-parameter MoE language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onNVIDIA A100 SXM4 80GBCheapest card in our directory with comfortable headroom (80 GB) for this model at Q4 (~66.3 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters675B

Active Params41B

ArchitectureMoE

Context Length128K tokens

ModalityText + Vision

Training Cutoff2025

ProviderMistral AI

Download Size681.5 GB

Community

Monthly Downloads1.6K

Likes228

Last Updated5 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mistral-large-3:675b-cloud

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Mistral Research LicenseView Full License

Performance & Scoring

Benchmarks

Arena Score

89.5

Overall Score

61.7BB

Benchmark40%

89.5

Popularity25%

15.4

Efficiency20%

50.8

Versatility15%

79.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	57.7 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	66.3 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	70.4 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	75.3 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	124.5 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	44.9 tok/s	66.3 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	58.3 tok/s	66.3 GB
AMD Instinct MI300XAMD	SS	64.4 tok/s	66.3 GB
Google TPU v7 (Ironwood)Google	SS	89.7 tok/s	66.3 GB
NVIDIA B200 GPUNVIDIA	SS	97.2 tok/s	66.3 GB
Google Cloud TPU v5pGoogle	SS	33.6 tok/s	66.3 GB
AMD Instinct MI325XAMD	SS	72.9 tok/s	66.3 GB
AMD Instinct MI355XAMD	SS	97.2 tok/s	66.3 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	29.8 tok/s	66.3 GB
NVIDIA H100 SXM5 80GBNVIDIA	SS	40.7 tok/s	66.3 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	86.3 tok/s	66.3 GB
Dell Pro Max with GB300Dell	SS	86.3 tok/s	66.3 GB
Gigabyte W775-V10-L01Gigabyte	SS	86.3 tok/s	66.3 GB
HP ZGX Fury AI StationHP	SS	86.3 tok/s	66.3 GB
MSI XpertStation WS300MSI	SS	86.3 tok/s	66.3 GB
SuperMicro Super AI StationSuperMicro	SS	86.3 tok/s	66.3 GB
NVIDIA A100 SXM4 80GBNVIDIA	AA	24.8 tok/s	66.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	9.7 tok/s	66.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	7.5 tok/s	66.3 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	7.5 tok/s	66.3 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	7.5 tok/s	66.3 GB
Apple M4 Max (40-core GPU)Apple	BB	6.6 tok/s	66.3 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.6 tok/s	66.3 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.6 tok/s	66.3 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.6 tok/s	66.3 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Apple M4 Pro (14-core CPU, 20-core GPU) (~3.3 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mistral Large 3 675B on Apple M4 Pro (14-core CPU, 20-core GPU) · ~3.3 tok/s · 60W	$0.603
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 66 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA A100 80GB PCIeVast.ai · Spot · 80 GB VRAM	$1.02

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture and Technical Details

Key Technical Specifications:

Total Parameters: 675B
Active Parameters: 41B per token
Context Window: 128,000 tokens
Modality: Multimodal (Text + Vision)
Training Cutoff: 2025
Architecture: Sparse MoE with native vision encoders

Capabilities and Use Cases

Advanced Reasoning and Coding

Multimodal Workflows

The native vision capability enables:

Technical Diagram Interpretation: Converting whiteboard architecture sketches into Mermaid.js or Terraform code.
Document Processing: Extracting structured JSON from complex, non-standardized invoices or medical forms.
UI/UX Auditing: Reviewing frontend screenshots for accessibility or design-to-code alignment.

Function Calling and Multilingual Support

Running Mistral Large 3 675B Locally

Mistral Large 3 675B VRAM Requirements

4-bit Quantization (Q4_K_M): Requires ~380GB - 400GB of VRAM. This is the best quantization for Mistral Large 3 675B, offering the best balance between perplexity loss and memory footprint.
2-bit Quantization (IQ2_XS): Requires ~180GB - 200GB of VRAM. This is the bare minimum for "functional" use, though some intelligence degradation occurs.

Recommended Hardware Configurations

How to run 675B model on consumer GPUs? You cannot run this on a single RTX 4090. To run a Q4 quantization locally, you generally need:

Multi-GPU Workstation: 8x NVIDIA RTX 6000 Ada (48GB each) or 16x RTX 4090 (24GB each) via PCIe expansion/NVLink.
Mac Studio/Mac Pro: An M2 Ultra or M4 Max with 192GB of Unified Memory can run a 2-bit or 3-bit version, but for 4-bit, you would need a cluster or a specialized high-memory Mac configuration if available.
Server Grade: A single node of 8x H100 or A100 (80GB) is the standard production environment for this model.

Getting Started with Ollama

The quickest way to deploy is via Ollama. Once you have the necessary VRAM, you can run:

ollama run mistral-large:675b

(Note: Ensure you are using a version of Ollama that supports the 2025 Mistral MoE architecture updates).

How It Compares

When choosing a local AI model 675B parameters 2025, Mistral Large 3 675B is often compared against Llama 3.1 405B and DeepSeek-V3.

Mistral Large 3 675B vs Llama 3.1 405B: While Llama 3.1 405B is a dense model, Mistral's MoE architecture often results in faster inference (tokens per second) once the model is loaded into memory. Mistral also generally leads in multilingual tasks and specific European-language nuances.
Mistral Large 3 675B vs DeepSeek-V3: DeepSeek-V3 is another MoE powerhouse. Mistral Large 3 typically offers superior performance in "out-of-the-box" instruction following and vision-integrated tasks, whereas DeepSeek may offer higher performance in pure coding/math at the cost of more complex deployment requirements.