Mistral AI

Mistral Large 2

Mistral AI's 123B dense flagship with state-of-the-art reasoning, 80+ coding languages, and multilingual capabilities. 128K context.

123B paramsDense128K ctx

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A workable 123B-parameter dense language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint. On the rise in download charts.

Run this onAMD Instinct MI325XCheapest card in our directory with comfortable headroom (256 GB) for this model at Q4 (~197.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters123B

ArchitectureDense

Context Length128K tokens

ModalityText Only

Training Cutoff2024

ProviderMistral AI

Download Size490.4 GB

Community

Monthly Downloads10.3K

Likes259

Last Updated9 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mistral-large:latest

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Mistral Research LicenseView Full License

Performance & Scoring

Benchmarks

Arena Score

64.8

Overall Score

42.4CC

Benchmark40%

64.8

Popularity25%

19.8

Efficiency20%

9.8

Versatility15%

63.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	172.0 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	197.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	210.1 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	224.9 GB	Excellent	Near-lossless quality with manageable size
Q8_0	255.6 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	372.5 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


AMD Instinct MI355XAMD	SS	32.6 tok/s	197.8 GB
AMD Instinct MI325XAMD	AA	24.4 tok/s	197.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	AA	28.9 tok/s	197.8 GB
Dell Pro Max with GB300Dell	AA	28.9 tok/s	197.8 GB
HP ZGX Fury AI StationHP	AA	28.9 tok/s	197.8 GB
MSI XpertStation WS300MSI	AA	28.9 tok/s	197.8 GB
SuperMicro Super AI StationSuperMicro	AA	28.9 tok/s	197.8 GB
Gigabyte W775-V10-L01Gigabyte	AA	28.9 tok/s	197.8 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	BB	3.3 tok/s	197.8 GB
Apple Mac Studio (M3 Ultra, 2025)Apple	BB	3.3 tok/s	197.8 GB
NVIDIA B200 GPUNVIDIA	BB	32.6 tok/s	197.8 GB
Google TPU v7 (Ironwood)Google	BB	30.0 tok/s	197.8 GB
AMD Instinct MI300XAMD	CC	21.6 tok/s	197.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	DD	3.3 tok/s	197.8 GB
ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	FF	2.1 tok/s	197.8 GB
Acer Veriton GN100 AI MiniAcer	FF	1.1 tok/s	197.8 GB
AMD Radeon RX 7600 8GBAMD	FF	1.2 tok/s	197.8 GB
AMD Radeon RX 7700 XTAMD	FF	1.8 tok/s	197.8 GB
AMD Radeon RX 7800 XTAMD	FF	2.5 tok/s	197.8 GB
AMD Radeon RX 7900 XTAMD	FF	3.3 tok/s	197.8 GB
AMD Radeon RX 7900 XTXAMD	FF	3.9 tok/s	197.8 GB
AMD Radeon RX 9070AMD	FF	2.6 tok/s	197.8 GB
AMD Radeon RX 9070 XTAMD	FF	2.6 tok/s	197.8 GB
Apple M4Apple	FF	0.5 tok/s	197.8 GB
Apple M4 Max (40-core GPU)Apple	FF	2.2 tok/s	197.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on AMD Instinct MI300X (~22 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mistral Large 2 on AMD Instinct MI300X · ~22 tok/s · 750W	$1.16
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 198 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA B300RunPod · Community · 288 GB VRAM	$6.94
NVIDIA B300RunPod · Spot · 288 GB VRAM	$6.94
NVIDIA B300RunPod · Secure · 288 GB VRAM	$7.39

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Mistral Large 2 is Mistral AI’s flagship 123B dense parameter model, designed to compete directly with frontier models like GPT-4o and Llama 3.1 405B. Released as a significant upgrade over the original Mistral Large, this model represents a specific strategic choice in architecture: a 123B dense configuration. This size is intentionally calibrated to provide state-of-the-art reasoning and coding performance while remaining accessible to practitioners who have moved beyond single-GPU setups but cannot support the massive infrastructure required for 400B+ parameter models.

For local deployment, Mistral Large 2 fills the gap between the ubiquitous 70B models and the massive, often unwieldy, ultra-large-scale models. It excels in complex reasoning tasks, high-tier mathematical problem solving, and agentic workflows that require reliable function-calling. Because it is a dense model rather than a Mixture of Experts (MoE), every parameter is active during inference, providing a level of "intellectual density" that often translates to more stable performance in nuanced, multi-step instruction following compared to smaller or sparse models.

Architecture & Technical Details

The architecture of Mistral Large 2 is a standard Transformer-based dense decoder-only model with 123 billion parameters. Unlike the Mixtral series (which uses MoE), Mistral Large 2 processes every token through all 123B parameters. While this makes the model more computationally expensive per token than an MoE model of a similar total size, it results in a higher ceiling for reasoning and knowledge retrieval accuracy.

A key technical highlight is the 128,000-token context window. This allows practitioners to process entire codebases, long legal documents, or complex technical manuals in a single prompt. For local users, the 128K context is particularly valuable for Retrieval-Augmented Generation (RAG) pipelines where high-precision synthesis of multiple documents is required.

The model was trained with a focus on multilingual support and code generation. It supports over 80 coding languages and dozens of natural languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, and Chinese. The training cutoff is in 2024, ensuring it has knowledge of recent library updates and modern coding practices.

Capabilities & Use Cases

Mistral Large 2 is not a general-purpose "chat" toy; it is a workhorse for technical and analytical workloads. Its performance on the Mistral Large 2 reasoning benchmark scores puts it in the top tier of open-weight models, specifically in its ability to handle logic-heavy tasks without "hallucinating" logical steps.

Advanced Coding and Refactoring

Mistral Large 2 is one of the premier choices for Mistral Large 2 for coding tasks. It supports 80+ languages and is particularly proficient in Python, C++, Java, and Rust. Because of its large parameter count and dense architecture, it can handle complex refactoring tasks that require an understanding of cross-file dependencies—especially when utilizing the full 128K context window.

Agentic Workflows and Function Calling

The model is natively trained for tool use and function-calling. For developers building local AI agents, Mistral Large 2 provides the reliability needed to output valid JSON and call external APIs or local scripts without frequent syntax errors. This makes it a viable core for local automation systems where reliability is non-negotiable.

Multilingual Technical Writing

While many models claim multilingualism, Mistral Large 2 maintains high reasoning capabilities across its supported languages. It can perform technical translation—such as translating a complex engineering spec from English to French—while preserving the underlying logic and technical accuracy of the text.

Running Mistral Large 2 Locally

To run Mistral Large 2 locally, you must account for its 123B dense architecture. Unlike 70B models which can squeeze onto two consumer GPUs, or 8B models that run on a laptop, a 123B model requires a multi-GPU workstation or a high-end Mac Studio.

Mistral Large 2 VRAM Requirements

VRAM is the primary bottleneck for this model. Below are the estimated requirements based on quantization levels:

FP16 (Unquantized): ~246 GB VRAM. Requires an enterprise-grade H100/A100 cluster.
Q8_0 (8-bit): ~130 GB VRAM. Requires a Mac Studio (192GB RAM) or a 6x RTX 3090/4090 rig.
Q4_K_M (4-bit): ~75–80 GB VRAM. This is the "sweet spot" for high-end local setups. It fits across 4x RTX 3090/4090 (24GB each) with room for context.
Q2_K (2-bit): ~45–50 GB VRAM. Can fit on 2x RTX 3090/4090, but expect a noticeable hit to reasoning performance.

Recommended Hardware

For the best GPU for Mistral Large 2, we recommend a multi-GPU setup or Unified Memory:

PC/Linux Workstation: A 4x RTX 3090 or 4090 setup (96GB total VRAM) is the gold standard for running this model at 4-bit or 5-bit quantization. This provides enough headroom for the 128K context window.
Mac Studio/Pro: An M2 Ultra, M3 Max, or M4 Max/Ultra with at least 128GB of Unified Memory. Apple Silicon is currently the most cost-effective way to get the high memory capacity needed for 123B models.
Enterprise: 2x NVIDIA A6000 or A100 (80GB versions) will run the Q4_K_M quantization with excellent stability.

Quantization and Performance

The best quantization for Mistral Large 2 for most practitioners is Q4_K_M or IQ4_XS. These 4-bit versions retain roughly 98-99% of the FP16 model's intelligence while reducing the memory footprint by over 60%.

In terms of Mistral Large 2 tokens per second, performance varies by hardware:

4x RTX 4090 (via llama.cpp): Expect 8–12 tokens/second.
M2/M3 Ultra (Apple Silicon): Expect 6–10 tokens/second.
Dual A6000: Expect 10–14 tokens/second.

The quickest way to get started is using Ollama. Simply run ollama run mistral-large to pull the library's default quantization. However, for precise VRAM management, using llama.cpp directly allows you to offload specific layers to your GPUs to maximize your available hardware.

How It Compares

When evaluating Mistral Large 2 performance, it is most often compared to Llama 3.1 70B and Llama 3.1 405B.

Mistral Large 2 vs Llama 3.1 70B

Mistral Large 2 is significantly more capable than the Llama 3.1 70B model, particularly in coding and complex multilingual reasoning. However, the 70B model is much easier to run, fitting comfortably on two RTX 3090s. Choose Mistral Large 2 if your hardware supports it and your task requires "frontier-level" logic that 70B models occasionally fail.

Mistral Large 2 vs Llama 3.1 405B

Llama 3.1 405B is arguably the more powerful model in terms of raw knowledge and scale, but it is nearly impossible to run locally for most individuals (requiring ~800GB of VRAM for FP16 or ~250GB for 4-bit). Mistral Large 2 offers a "compressed" version of that same intelligence class. In many benchmarks, particularly coding, Mistral Large 2 punches way above its weight, often matching or exceeding the 405B model's utility in a package that is a fraction of the size.

Mistral Large 2 vs Mixtral 8x22B

While Mixtral 8x22B is an MoE model with a similar total parameter count (141B), it only uses 39B active parameters per token. This makes Mixtral faster for inference but generally less "smart" for deep reasoning compared to the 123B dense architecture of Mistral Large 2. If speed is your priority, use Mixtral; if accuracy and reasoning are your priorities, Mistral Large 2 is the superior choice for a local AI model 123B parameters 2025.

Related Models

Mistral AI

Explore the Provider

See all Mistral AI models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Mistral AI model we track.

Open Mistral AI

Explore the Family

See every Mistral release

The full Mistral family leaderboard with sizes, benchmark scores, and a release timeline.

Open Mistral

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Mistral AI

Mistral Large 2

Mistral AI's 123B dense flagship with state-of-the-art reasoning, 80+ coding languages, and multilingual capabilities. 128K context.

123B paramsDense128K ctx

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A workable 123B-parameter dense language model from Mistral AI. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint. On the rise in download charts.

Run this onAMD Instinct MI325XCheapest card in our directory with comfortable headroom (256 GB) for this model at Q4 (~197.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters123B

ArchitectureDense

Context Length128K tokens

ModalityText Only

Training Cutoff2024

ProviderMistral AI

Download Size490.4 GB

Community

Monthly Downloads10.3K

Likes259

Last Updated9 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run mistral-large:latest

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Mistral Research LicenseView Full License

Performance & Scoring

Benchmarks

Arena Score

64.8

Overall Score

42.4CC

Benchmark40%

64.8

Popularity25%

19.8

Efficiency20%

9.8

Versatility15%

63.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	172.0 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	197.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	210.1 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	224.9 GB	Excellent	Near-lossless quality with manageable size
Q8_0	255.6 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	372.5 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


AMD Instinct MI355XAMD	SS	32.6 tok/s	197.8 GB
AMD Instinct MI325XAMD	AA	24.4 tok/s	197.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	AA	28.9 tok/s	197.8 GB
Dell Pro Max with GB300Dell	AA	28.9 tok/s	197.8 GB
HP ZGX Fury AI StationHP	AA	28.9 tok/s	197.8 GB
MSI XpertStation WS300MSI	AA	28.9 tok/s	197.8 GB
SuperMicro Super AI StationSuperMicro	AA	28.9 tok/s	197.8 GB
Gigabyte W775-V10-L01Gigabyte	AA	28.9 tok/s	197.8 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	BB	3.3 tok/s	197.8 GB
Apple Mac Studio (M3 Ultra, 2025)Apple	BB	3.3 tok/s	197.8 GB
NVIDIA B200 GPUNVIDIA	BB	32.6 tok/s	197.8 GB
Google TPU v7 (Ironwood)Google	BB	30.0 tok/s	197.8 GB
AMD Instinct MI300XAMD	CC	21.6 tok/s	197.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	DD	3.3 tok/s	197.8 GB
ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	FF	2.1 tok/s	197.8 GB
Acer Veriton GN100 AI MiniAcer	FF	1.1 tok/s	197.8 GB
AMD Radeon RX 7600 8GBAMD	FF	1.2 tok/s	197.8 GB
AMD Radeon RX 7700 XTAMD	FF	1.8 tok/s	197.8 GB
AMD Radeon RX 7800 XTAMD	FF	2.5 tok/s	197.8 GB
AMD Radeon RX 7900 XTAMD	FF	3.3 tok/s	197.8 GB
AMD Radeon RX 7900 XTXAMD	FF	3.9 tok/s	197.8 GB
AMD Radeon RX 9070AMD	FF	2.6 tok/s	197.8 GB
AMD Radeon RX 9070 XTAMD	FF	2.6 tok/s	197.8 GB
Apple M4Apple	FF	0.5 tok/s	197.8 GB
Apple M4 Max (40-core GPU)Apple	FF	2.2 tok/s	197.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on AMD Instinct MI300X (~22 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Mistral Large 2 on AMD Instinct MI300X · ~22 tok/s · 750W	$1.16
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 198 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA B300RunPod · Community · 288 GB VRAM	$6.94
NVIDIA B300RunPod · Spot · 288 GB VRAM	$6.94
NVIDIA B300RunPod · Secure · 288 GB VRAM	$7.39

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture & Technical Details

Capabilities & Use Cases

Advanced Coding and Refactoring

Agentic Workflows and Function Calling

Multilingual Technical Writing

Running Mistral Large 2 Locally

Mistral Large 2 VRAM Requirements

VRAM is the primary bottleneck for this model. Below are the estimated requirements based on quantization levels:

FP16 (Unquantized): ~246 GB VRAM. Requires an enterprise-grade H100/A100 cluster.
Q8_0 (8-bit): ~130 GB VRAM. Requires a Mac Studio (192GB RAM) or a 6x RTX 3090/4090 rig.
Q4_K_M (4-bit): ~75–80 GB VRAM. This is the "sweet spot" for high-end local setups. It fits across 4x RTX 3090/4090 (24GB each) with room for context.
Q2_K (2-bit): ~45–50 GB VRAM. Can fit on 2x RTX 3090/4090, but expect a noticeable hit to reasoning performance.

Recommended Hardware

For the best GPU for Mistral Large 2, we recommend a multi-GPU setup or Unified Memory:

PC/Linux Workstation: A 4x RTX 3090 or 4090 setup (96GB total VRAM) is the gold standard for running this model at 4-bit or 5-bit quantization. This provides enough headroom for the 128K context window.
Mac Studio/Pro: An M2 Ultra, M3 Max, or M4 Max/Ultra with at least 128GB of Unified Memory. Apple Silicon is currently the most cost-effective way to get the high memory capacity needed for 123B models.
Enterprise: 2x NVIDIA A6000 or A100 (80GB versions) will run the Q4_K_M quantization with excellent stability.