DeepSeek

DeepSeek-V3.2

State-of-the-art 685B MoE model with DeepSeek Sparse Attention and scalable RL. Gold-medal in IMO 2025 and IOI 2025. Performance comparable to GPT-5.

685B paramsMoE128K ctx

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Strongest at broad knowledge (MMLU-Pro) in its size class

A solid 685B-parameter MoE language model from DeepSeek. Pulls ahead on broad knowledge (MMLU-Pro) (84/100), so reach for it when that's the dimension that matters.

Run this onNVIDIA A100 SXM4 80GBCheapest card in our directory with comfortable headroom (80 GB) for this model at Q4 (~59.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters685B

Active Params37B

ArchitectureMoE

Context Length128K tokens

ModalityText Only

Training Cutoff2025

ProviderDeepSeek

Download Size689.5 GB

Community

Monthly Downloads2.9M

Likes1.4K

Last Updated6 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run deepseek-v3.2:cloud

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

DeepSeek LicenseView Full License

Performance & Scoring

Benchmarks

GPQA

75.1

MMLU-PRO

83.7

HLE

10.5

AA Intelligence Index

24.7

59.3

38.7

32.6

49.0

39.0

88.5

MBA Open Score

60.6BB

Benchmark40%

50.1

Popularity25%

78.5

Efficiency20%

46.5

Versatility15%

77.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	52.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	59.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	63.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	68.0 GB	Excellent	Near-lossless quality with manageable size
Q8_0	77.2 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	112.4 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	45.1 tok/s	59.8 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	49.8 tok/s	59.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	64.6 tok/s	59.8 GB
Google Cloud TPU v5pGoogle	SS	37.2 tok/s	59.8 GB
AMD Instinct MI300XAMD	SS	71.3 tok/s	59.8 GB
Google TPU v7 (Ironwood)Google	SS	99.3 tok/s	59.8 GB
NVIDIA B200 GPUNVIDIA	SS	107.6 tok/s	59.8 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	33.0 tok/s	59.8 GB
AMD Instinct MI325XAMD	SS	80.7 tok/s	59.8 GB
AMD Instinct MI355XAMD	SS	107.6 tok/s	59.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	95.5 tok/s	59.8 GB
Dell Pro Max with GB300Dell	SS	95.5 tok/s	59.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	95.5 tok/s	59.8 GB
HP ZGX Fury AI StationHP	SS	95.5 tok/s	59.8 GB
MSI XpertStation WS300MSI	SS	95.5 tok/s	59.8 GB
SuperMicro Super AI StationSuperMicro	SS	95.5 tok/s	59.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	27.4 tok/s	59.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	10.8 tok/s	59.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	8.3 tok/s	59.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	8.3 tok/s	59.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	8.3 tok/s	59.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	6.9 tok/s	59.8 GB
Apple M4 Max (40-core GPU)Apple	BB	7.3 tok/s	59.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	7.3 tok/s	59.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	7.3 tok/s	59.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Apple M4 Pro (14-core CPU, 20-core GPU) (~3.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)DeepSeek-V3.2 on Apple M4 Pro (14-core CPU, 20-core GPU) · ~3.7 tok/s · 60W	$0.545
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 60 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA H100 SXMVast.ai · Spot · 80 GB VRAM	$0.59
NVIDIA A100 80GB PCIeRunPod · Community · 80 GB VRAM	$1.19
NVIDIA A100 80GB PCIeRunPod · Spot · 80 GB VRAM	$1.19

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

DeepSeek-V3.2 is a state-of-the-art Mixture-of-Experts (MoE) model that represents the peak of open-weights performance as of 2025. With a total parameter count of 685B, it is designed to compete directly with frontier models like GPT-5 and Claude 3.5 Sonnet. Despite its massive total scale, the model utilizes a highly optimized MoE architecture where only 37B parameters are active during any single inference step, striking a unique balance between extreme reasoning capabilities and computational efficiency.

Developed by DeepSeek, this model is the culmination of advancements in scalable Reinforcement Learning (RL) and Sparse Attention mechanisms. It is specifically engineered for complex tasks that require high-level logical deduction, such as advanced mathematics and software engineering. Its performance in the IMO 2025 (International Mathematical Olympiad) and IOI 2025 (International Olympiad in Informatics) benchmarks places it in the top tier of reasoning models globally. For practitioners looking to run DeepSeek-V3.2 locally, the primary challenge is not the compute speed—thanks to the 37B active parameters—but the massive VRAM footprint required to house the 685B parameter weights.

Architecture & Technical Details

DeepSeek-V3.2 utilizes a sophisticated Mixture-of-Experts (MoE) framework that differentiates it from dense models like Llama 3.1 405B. In a dense model, every parameter is activated for every token generated. In DeepSeek-V3.2, the 685B total parameters act as a vast knowledge base, but the router only engages 37B parameters per token. This DeepSeek-V3.2 MoE efficiency allows for much faster inference speeds (tokens per second) than a 600B+ dense model would otherwise permit, provided the hardware can accommodate the full model in memory.

Sparse Attention and Context Window

The model features DeepSeek’s proprietary Sparse Attention mechanism, which reduces the computational overhead of the self-attention layer. It supports a context length of 128,000 tokens, making it suitable for analyzing entire codebases or long-form technical documentation. However, practitioners should note that at 128k context, the KV (Key-Value) cache requirements become a significant factor in total DeepSeek-V3.2 VRAM requirements, especially when using high-precision formats.

Training and RL

DeepSeek-V3.2 was trained on a massive dataset with a 2025 cutoff, ensuring it is up-to-date with the latest programming frameworks and mathematical research. The use of scalable RL (Reinforcement Learning) during the post-training phase has specifically tuned the model for "Chain of Thought" reasoning, allowing it to verify its own logic before outputting a final answer.

Capabilities & Use Cases

This is a text-only model optimized for high-logic environments. Unlike general-purpose chat models that focus on creative writing, DeepSeek-V3.2 is a "reasoning-first" engine.

Advanced Software Engineering: DeepSeek-V3.2 for coding excels at system architecture design, debugging complex race conditions, and refactoring legacy code across multiple files. Its performance on the IOI 2025 dataset demonstrates a level of algorithmic proficiency that surpasses most 70B and 400B class models.
Mathematical Proofs and STEM: With gold-medal level performance in IMO 2025, the model is capable of formal mathematical reasoning, symbolic logic, and solving complex physics problems.
Function-Calling and Agentic Workflows: The model is natively trained for tool-use and function-calling. Its high instruction-following accuracy makes it an ideal backbone for local autonomous agents that need to interact with APIs or file systems.
Multilingual Support: While its reasoning is its strongest suit, it maintains high fluency across dozens of languages, making it viable for global technical support and translation of technical documentation.

Running DeepSeek-V3.2 Locally

Attempting to run DeepSeek-V3.2 locally requires a significant investment in hardware. Because it is a local AI model 685B parameters 2025 release, it pushes the boundaries of what is possible on consumer and prosumer equipment.

DeepSeek-V3.2 Hardware Requirements

To host this model, the primary bottleneck is VRAM. The 685B parameters must be loaded into memory to achieve usable performance.

FP16 (Original Precision): Requires ~1.3 TB of VRAM. This is strictly reserved for enterprise clusters (e.g., 16x H100 80GB).
Q4_K_M (4-bit Quantization): This is the best quantization for DeepSeek-V3.2 for high-end workstations. It requires approximately 380GB - 400GB of VRAM.
IQ2_XS (2-bit Quantization): This brings the requirement down to approximately 180GB - 200GB, making it potentially runnable on a Mac Studio with 192GB of Unified Memory or a multi-GPU Linux rig.

Best GPU for DeepSeek-V3.2

For a Linux-based build, the best GPU for DeepSeek-V3.2 is the NVIDIA RTX 6000 Ada (48GB) or the RTX 3090/4090 (24GB).

To run a Q4 quantization, you would need a cluster of 16x RTX 4090s, which is impractical for most.
To run 685B model on consumer GPU setups, practitioners typically use GGUF format with extreme quantization (IQ2_M) and offload layers across 8x RTX 3090/4090s via NVLink or high-speed PCIe.
Mac Hardware: The M4 Max or M4 Ultra with 128GB or 192GB of Unified Memory is currently the most efficient way for an individual developer to run the model at lower quantizations (2-bit to 3-bit).

Performance Expectations

Due to the MoE architecture, DeepSeek-V3.2 tokens per second are surprisingly high once the model is loaded. On an optimized 8x A100 setup, you can expect 20-30 tokens per second. On a consumer multi-GPU setup (4-bit), performance will likely hover between 2-5 tokens per second due to the overhead of PCIe communication between cards.

Software Recommendation

The quickest way to deploy the model is via Ollama or vLLM.

ollama run deepseek-v3.2:685b-q4_K_M (requires massive system RAM/VRAM)
For those with limited VRAM, using Llama.cpp with a GGUF file allows for partial offloading to system RAM, though this will result in extremely slow inference (sub-1 token per second).

How It Compares

DeepSeek-V3.2 sits in a rare class of models. Its most direct competitors are Llama 3.1 405B and Grok-1.

DeepSeek-V3.2 vs Llama 3.1 405B: Llama 3.1 405B is a dense model. While Llama has a smaller total parameter count, it requires more compute per token than DeepSeek's 37B active parameters. In DeepSeek-V3.2 reasoning benchmarks, the DeepSeek model generally outperforms Llama 3.1 in coding and mathematics, though Llama may feel more "natural" in creative prose. DeepSeek's 128k context also matches Llama's 128k, but DeepSeek's MoE structure makes it faster during the generation phase.
DeepSeek-V3.2 vs Grok-1: Grok-1 is also an MoE model (314B total). DeepSeek-V3.2 is significantly larger (685B) and benefits from a much later training cutoff (2025 vs 2024). DeepSeek-V3.2 demonstrates superior performance in specialized logic tasks, whereas Grok-1 was designed for more general-purpose "edgy" conversational AI.

For practitioners, the choice to use DeepSeek-V3.2 over a smaller 70B model comes down to whether your use case requires "frontier-level" reasoning. If you are building a simple RAG (Retrieval-Augmented Generation) system, a 70B model is more cost-effective. If you are building an automated code-refactoring agent or a mathematical verification tool, the hardware investment for DeepSeek-V3.2 is justified by its leap in logical accuracy.

Related Models

DeepSeek

Explore the Provider

See all DeepSeek models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every DeepSeek model we track.

Open DeepSeek

Explore the Family

See every DeepSeek release

The full DeepSeek family leaderboard with sizes, benchmark scores, and a release timeline.

Open DeepSeek

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

DeepSeek

DeepSeek-V3.2

State-of-the-art 685B MoE model with DeepSeek Sparse Attention and scalable RL. Gold-medal in IMO 2025 and IOI 2025. Performance comparable to GPT-5.

685B paramsMoE128K ctx

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Strongest at broad knowledge (MMLU-Pro) in its size class

A solid 685B-parameter MoE language model from DeepSeek. Pulls ahead on broad knowledge (MMLU-Pro) (84/100), so reach for it when that's the dimension that matters.

Run this onNVIDIA A100 SXM4 80GBCheapest card in our directory with comfortable headroom (80 GB) for this model at Q4 (~59.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters685B

Active Params37B

ArchitectureMoE

Context Length128K tokens

ModalityText Only

Training Cutoff2025

ProviderDeepSeek

Download Size689.5 GB

Community

Monthly Downloads2.9M

Likes1.4K

Last Updated6 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run deepseek-v3.2:cloud

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

DeepSeek LicenseView Full License

Performance & Scoring

Benchmarks

GPQA

75.1

MMLU-PRO

83.7

HLE

10.5

AA Intelligence Index

24.7

59.3

38.7

32.6

49.0

39.0

88.5

MBA Open Score

60.6BB

Benchmark40%

50.1

Popularity25%

78.5

Efficiency20%

46.5

Versatility15%

77.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	52.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	59.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	63.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	68.0 GB	Excellent	Near-lossless quality with manageable size
Q8_0	77.2 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	112.4 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	45.1 tok/s	59.8 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	49.8 tok/s	59.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	64.6 tok/s	59.8 GB
Google Cloud TPU v5pGoogle	SS	37.2 tok/s	59.8 GB
AMD Instinct MI300XAMD	SS	71.3 tok/s	59.8 GB
Google TPU v7 (Ironwood)Google	SS	99.3 tok/s	59.8 GB
NVIDIA B200 GPUNVIDIA	SS	107.6 tok/s	59.8 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	33.0 tok/s	59.8 GB
AMD Instinct MI325XAMD	SS	80.7 tok/s	59.8 GB
AMD Instinct MI355XAMD	SS	107.6 tok/s	59.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	95.5 tok/s	59.8 GB
Dell Pro Max with GB300Dell	SS	95.5 tok/s	59.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	95.5 tok/s	59.8 GB
HP ZGX Fury AI StationHP	SS	95.5 tok/s	59.8 GB
MSI XpertStation WS300MSI	SS	95.5 tok/s	59.8 GB
SuperMicro Super AI StationSuperMicro	SS	95.5 tok/s	59.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	27.4 tok/s	59.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	10.8 tok/s	59.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	8.3 tok/s	59.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	8.3 tok/s	59.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	8.3 tok/s	59.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	6.9 tok/s	59.8 GB
Apple M4 Max (40-core GPU)Apple	BB	7.3 tok/s	59.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	7.3 tok/s	59.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	7.3 tok/s	59.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Apple M4 Pro (14-core CPU, 20-core GPU) (~3.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)DeepSeek-V3.2 on Apple M4 Pro (14-core CPU, 20-core GPU) · ~3.7 tok/s · 60W	$0.545
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 60 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA H100 SXMVast.ai · Spot · 80 GB VRAM	$0.59
NVIDIA A100 80GB PCIeRunPod · Community · 80 GB VRAM	$1.19
NVIDIA A100 80GB PCIeRunPod · Spot · 80 GB VRAM	$1.19

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

Sparse Attention and Context Window

Training and RL

Capabilities & Use Cases

This is a text-only model optimized for high-logic environments. Unlike general-purpose chat models that focus on creative writing, DeepSeek-V3.2 is a "reasoning-first" engine.

Advanced Software Engineering: DeepSeek-V3.2 for coding excels at system architecture design, debugging complex race conditions, and refactoring legacy code across multiple files. Its performance on the IOI 2025 dataset demonstrates a level of algorithmic proficiency that surpasses most 70B and 400B class models.
Mathematical Proofs and STEM: With gold-medal level performance in IMO 2025, the model is capable of formal mathematical reasoning, symbolic logic, and solving complex physics problems.
Function-Calling and Agentic Workflows: The model is natively trained for tool-use and function-calling. Its high instruction-following accuracy makes it an ideal backbone for local autonomous agents that need to interact with APIs or file systems.
Multilingual Support: While its reasoning is its strongest suit, it maintains high fluency across dozens of languages, making it viable for global technical support and translation of technical documentation.

Running DeepSeek-V3.2 Locally

DeepSeek-V3.2 Hardware Requirements

To host this model, the primary bottleneck is VRAM. The 685B parameters must be loaded into memory to achieve usable performance.

FP16 (Original Precision): Requires ~1.3 TB of VRAM. This is strictly reserved for enterprise clusters (e.g., 16x H100 80GB).
Q4_K_M (4-bit Quantization): This is the best quantization for DeepSeek-V3.2 for high-end workstations. It requires approximately 380GB - 400GB of VRAM.
IQ2_XS (2-bit Quantization): This brings the requirement down to approximately 180GB - 200GB, making it potentially runnable on a Mac Studio with 192GB of Unified Memory or a multi-GPU Linux rig.

Best GPU for DeepSeek-V3.2

For a Linux-based build, the best GPU for DeepSeek-V3.2 is the NVIDIA RTX 6000 Ada (48GB) or the RTX 3090/4090 (24GB).

To run a Q4 quantization, you would need a cluster of 16x RTX 4090s, which is impractical for most.
To run 685B model on consumer GPU setups, practitioners typically use GGUF format with extreme quantization (IQ2_M) and offload layers across 8x RTX 3090/4090s via NVLink or high-speed PCIe.
Mac Hardware: The M4 Max or M4 Ultra with 128GB or 192GB of Unified Memory is currently the most efficient way for an individual developer to run the model at lower quantizations (2-bit to 3-bit).

Performance Expectations

Software Recommendation

The quickest way to deploy the model is via Ollama or vLLM.

ollama run deepseek-v3.2:685b-q4_K_M (requires massive system RAM/VRAM)
For those with limited VRAM, using Llama.cpp with a GGUF file allows for partial offloading to system RAM, though this will result in extremely slow inference (sub-1 token per second).

How It Compares

DeepSeek-V3.2 sits in a rare class of models. Its most direct competitors are Llama 3.1 405B and Grok-1.

DeepSeek-V3.2 vs Llama 3.1 405B: Llama 3.1 405B is a dense model. While Llama has a smaller total parameter count, it requires more compute per token than DeepSeek's 37B active parameters. In DeepSeek-V3.2 reasoning benchmarks, the DeepSeek model generally outperforms Llama 3.1 in coding and mathematics, though Llama may feel more "natural" in creative prose. DeepSeek's 128k context also matches Llama's 128k, but DeepSeek's MoE structure makes it faster during the generation phase.
DeepSeek-V3.2 vs Grok-1: Grok-1 is also an MoE model (314B total). DeepSeek-V3.2 is significantly larger (685B) and benefits from a much later training cutoff (2025 vs 2024). DeepSeek-V3.2 demonstrates superior performance in specialized logic tasks, whereas Grok-1 was designed for more general-purpose "edgy" conversational AI.