Alibaba

Qwen3-235B-A22B

Alibaba's flagship Qwen3 MoE model with 235B total / 22B active parameters. Hybrid thinking/non-thinking modes. Trained on 36T tokens across 119 languages. Competes with DeepSeek-R1 and o1.

235B paramsMoE131K ctx

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A solid 235B-parameter MoE language model from Alibaba. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onCorsair AI Workstation 300 (Ryzen AI Max 385)Cheapest card in our directory with comfortable headroom (48 GB) for this model at Q4 (~36.3 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Creative Writing

Summarization

Instruction Following

Model Specifications

Parameters235B

Active Params22B

ArchitectureMoE

Context Length131K tokens

ModalityText Only

Training Cutoff2024

ProviderAlibaba

Download Size470.2 GB

Community

Monthly Downloads449.8K

Likes1.1K

Last Updated10 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3:235b-a22b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

SWE-Pro

21.4

Arena Score

84.0

Overall Score

55.8BB

Benchmark40%

52.7

Popularity25%

49.3

Efficiency20%

55.7

Versatility15%

74.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	31.7 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	36.3 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	38.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	41.2 GB	Excellent	Near-lossless quality with manageable size
Q8_0	46.7 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	67.6 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA A100 SXM4 80GBNVIDIA	SS	45.2 tok/s	36.3 GB
NVIDIA H100 SXM5 80GBNVIDIA	SS	74.2 tok/s	36.3 GB
Google Cloud TPU v5pGoogle	SS	61.3 tok/s	36.3 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	54.3 tok/s	36.3 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	82.0 tok/s	36.3 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	106.4 tok/s	36.3 GB
AMD Instinct MI300XAMD	SS	117.4 tok/s	36.3 GB
Google TPU v7 (Ironwood)Google	SS	163.5 tok/s	36.3 GB
NVIDIA B200 GPUNVIDIA	SS	177.3 tok/s	36.3 GB
AMD Instinct MI325XAMD	SS	133.0 tok/s	36.3 GB
AMD Instinct MI355XAMD	SS	177.3 tok/s	36.3 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	157.3 tok/s	36.3 GB
Dell Pro Max with GB300Dell	SS	157.3 tok/s	36.3 GB
HP ZGX Fury AI StationHP	SS	157.3 tok/s	36.3 GB
MSI XpertStation WS300MSI	SS	157.3 tok/s	36.3 GB
SuperMicro Super AI StationSuperMicro	SS	157.3 tok/s	36.3 GB
Gigabyte W775-V10-L01Gigabyte	SS	157.3 tok/s	36.3 GB
NVIDIA RTX 6000 Ada GenerationNVIDIA	AA	21.3 tok/s	36.3 GB
Origin PC L-CLASS v2Origin PC	AA	21.3 tok/s	36.3 GB
NVIDIA L40SNVIDIA	AA	19.1 tok/s	36.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	17.7 tok/s	36.3 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	AA	17.7 tok/s	36.3 GB
Apple Mac Studio (M1 Max, 2022)Apple	AA	8.9 tok/s	36.3 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	11.3 tok/s	36.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	13.6 tok/s	36.3 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~5.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Qwen3-235B-A22B on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~5.7 tok/s · 150W	$0.881
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 36 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Qwen3-235B-A22B is Alibaba Cloud’s flagship Mixture of Experts (MoE) model, designed to compete directly with frontier-class reasoning models like DeepSeek-R1 and OpenAI’s o1. Released as a significant upgrade to the Qwen series, this model is built on a massive 36-trillion token dataset covering 119 languages. With 235 billion total parameters and 22 billion active parameters per token, it represents a strategic balance between high-capacity knowledge storage and computational efficiency during inference.

For local practitioners, Qwen3-235B-A22B is a heavyweight contender that requires substantial hardware but offers state-of-the-art performance in reasoning, mathematics, and multilingual instruction following. Unlike smaller dense models, this local AI model 235B parameters 2025 release utilizes its MoE architecture to provide the logic of a 200B+ parameter model while maintaining the inference latency more typical of a 20B-30B parameter model, provided the weights can be fit into accessible VRAM.

Architecture & Technical Details

The core of the Qwen3-235B-A22B is its Mixture of Experts (MoE) architecture. In a standard dense model, every parameter is activated for every token generated. In this MoE implementation, only 22 billion of the 235 billion parameters are "active" for any given computation.

MoE Efficiency and VRAM Impact

When you run Qwen3-235B-A22B locally, it is vital to understand the distinction between storage and compute. While the Qwen3-235B-A22B MoE efficiency allows for faster token generation (as only 22B parameters are being crunched by the GPU), the entire 235B parameter set must still reside in VRAM. This means the hardware barrier to entry is determined by the total parameter count, while the "tokens per second" (TPS) is influenced by the active parameter count.

Context and Training

Context Length: 131,072 tokens. This enables massive RAG (Retrieval-Augmented Generation) workloads, allowing you to feed entire codebases or multiple technical manuals into the prompt.
Training Data: 36T tokens. This is a significant increase over previous generations, resulting in a model with deep world knowledge and a 2024 training cutoff.
License: Apache 2.0. This makes the model highly attractive for commercial local deployment without the restrictive terms found in some other frontier weights.

Capabilities & Use Cases

Qwen3-235B-A22B is not a general-purpose chatbot; it is a reasoning engine. It features a hybrid thinking/non-thinking mode, allowing users to toggle between standard responses and "Chain of Thought" (CoT) processing for complex tasks.

Advanced Reasoning and Math

The model excels in Qwen3-235B-A22B reasoning benchmarks, particularly in multi-step logic problems and high-level mathematics. For engineers, this translates to a tool capable of architectural planning and debugging complex system interactions that smaller models (like Llama 3.1 70B) often fail to grasp.

Qwen3-235B-A22B for Coding

In software development, the model provides high-tier performance in:

Refactoring: Handling large-scale logic shifts across multiple files thanks to its 128k context window.
Function-Calling: Precise JSON output and tool-use capabilities for local agentic workflows.
Multilingual Programming: Strong support for obscure languages and specialized documentation, owing to its 119-language training base.

Multilingual and Creative Tasks

Beyond technical utility, the model is one of the most capable multilingual models available for local use. It handles nuanced translation and creative writing in non-English languages with significantly higher fidelity than Western-centric models of similar size.

Running Qwen3-235B-A22B Locally

Running a model of this scale is a hardware-intensive endeavor. You cannot run this on a single consumer GPU without aggressive quantization that may degrade performance.

Qwen3-235B-A22B VRAM Requirements

To determine the best GPU for Qwen3-235B-A22B, you must first select your quantization level. VRAM requirements are roughly calculated as (Parameters * Bits) / 8 * 1.2 (including overhead for context).

FP16 (Unquantized): ~470GB VRAM. Requires an enterprise cluster (e.g., 8x A100 80GB).
Q8_0 (8-bit): ~245GB VRAM. Requires 4x A6000 or 4x A100.
Q4_K_M (4-bit - Recommended): ~140GB - 150GB VRAM. This is the best quantization for Qwen3-235B-A22B, offering the best balance between perplexity loss and hardware accessibility.
Q2_K (2-bit): ~85GB - 95GB VRAM. Can fit on 4x RTX 4090 (24GB each) or a Mac Studio with 128GB+ Unified Memory.

Recommended Hardware Requirements

Prosumer PC: A multi-GPU setup is mandatory. A 4x RTX 4090 (96GB VRAM) build can run the model at Q2_K or Q3_K quantizations.
Mac Studio/Mac Pro: An M2/M3/M4 Ultra with 192GB of Unified Memory is arguably the most stable way to run this model locally with higher-bit quantizations (Q4 or Q5).
Enterprise Workstation: Systems equipped with NVIDIA A6000 (48GB) or A100 (80GB) cards.

Expected Performance (Tokens Per Second)

Because only 22B parameters are active, Qwen3-235B-A22B tokens per second are surprisingly high for a model of this size. On a quad-4090 setup using llama.cpp, you can expect 5–12 tokens per second depending on quantization and context usage. On Apple Silicon (M2 Ultra), speeds generally hover around 3–6 tokens per second.

Quick Start with Ollama

The fastest how to run 235B model on consumer GPU setups is via Ollama. If you have the required VRAM, you can pull the model directly:

ollama run qwen3:235b

Ollama will automatically handle the MoE offloading, but ensure your system has enough swap space if you are pushing the limits of your physical VRAM.

How It Compares

When evaluating Qwen3-235B-A22B, it is most often compared to DeepSeek-R1 and Llama 3.1 405B.

Qwen3-235B-A22B vs. DeepSeek-R1

DeepSeek-R1 (671B total / 37B active) is a larger model that generally outperforms Qwen3 in pure "thinking" tasks and PhD-level reasoning. However, Qwen3-235B-A22B is significantly easier to fit on prosumer hardware. While DeepSeek-R1 requires nearly 400GB of VRAM for a reasonable 4-bit quantization, Qwen3 fits into ~150GB, making it much more viable for local workstations.

Qwen3-235B-A22B vs. Llama 3.1 405B

Llama 3.1 405B is a dense model, meaning every parameter is used for every token. This makes Llama 405B significantly slower for local inference than Qwen3-235B-A22B. While Llama may have a slight edge in general English-language instruction following, Qwen3 wins on inference speed and multilingual capabilities.

Why Choose Qwen3-235B-A22B?

You should choose this model if you need frontier-level reasoning and have a multi-GPU or high-RAM Mac setup, but cannot justify the massive 400GB+ VRAM requirement of models like DeepSeek-R1 or Llama 405B. It is the "middle ground" of the ultra-large model category—offering the power of a 200B+ model with the operational footprint of a much smaller one.

Related Models

Alibaba

Explore the Provider

See all Alibaba models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Alibaba model we track.

Open Alibaba

Explore the Family

See every Qwen release

The full Qwen family leaderboard with sizes, benchmark scores, and a release timeline.

Open Qwen

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Alibaba

Qwen3-235B-A22B

Alibaba's flagship Qwen3 MoE model with 235B total / 22B active parameters. Hybrid thinking/non-thinking modes. Trained on 36T tokens across 119 languages. Competes with DeepSeek-R1 and o1.

235B paramsMoE131K ctx

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Datacenter-tier inference at frontier quality

A solid 235B-parameter MoE language model from Alibaba. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint.

Run this onCorsair AI Workstation 300 (Ryzen AI Max 385)Cheapest card in our directory with comfortable headroom (48 GB) for this model at Q4 (~36.3 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Reasoning

Function Calling

Multilingual

Math

Creative Writing

Summarization

Instruction Following

Model Specifications

Parameters235B

Active Params22B

ArchitectureMoE

Context Length131K tokens

ModalityText Only

Training Cutoff2024

ProviderAlibaba

Download Size470.2 GB

Community

Monthly Downloads449.8K

Likes1.1K

Last Updated10 months ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3:235b-a22b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

SWE-Pro

21.4

Arena Score

84.0

Overall Score

55.8BB

Benchmark40%

52.7

Popularity25%

49.3

Efficiency20%

55.7

Versatility15%

74.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	31.7 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	36.3 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	38.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	41.2 GB	Excellent	Near-lossless quality with manageable size
Q8_0	46.7 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	67.6 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA A100 SXM4 80GBNVIDIA	SS	45.2 tok/s	36.3 GB
NVIDIA H100 SXM5 80GBNVIDIA	SS	74.2 tok/s	36.3 GB
Google Cloud TPU v5pGoogle	SS	61.3 tok/s	36.3 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	54.3 tok/s	36.3 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	82.0 tok/s	36.3 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	106.4 tok/s	36.3 GB
AMD Instinct MI300XAMD	SS	117.4 tok/s	36.3 GB
Google TPU v7 (Ironwood)Google	SS	163.5 tok/s	36.3 GB
NVIDIA B200 GPUNVIDIA	SS	177.3 tok/s	36.3 GB
AMD Instinct MI325XAMD	SS	133.0 tok/s	36.3 GB
AMD Instinct MI355XAMD	SS	177.3 tok/s	36.3 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	157.3 tok/s	36.3 GB
Dell Pro Max with GB300Dell	SS	157.3 tok/s	36.3 GB
HP ZGX Fury AI StationHP	SS	157.3 tok/s	36.3 GB
MSI XpertStation WS300MSI	SS	157.3 tok/s	36.3 GB
SuperMicro Super AI StationSuperMicro	SS	157.3 tok/s	36.3 GB
Gigabyte W775-V10-L01Gigabyte	SS	157.3 tok/s	36.3 GB
NVIDIA RTX 6000 Ada GenerationNVIDIA	AA	21.3 tok/s	36.3 GB
Origin PC L-CLASS v2Origin PC	AA	21.3 tok/s	36.3 GB
NVIDIA L40SNVIDIA	AA	19.1 tok/s	36.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	17.7 tok/s	36.3 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	AA	17.7 tok/s	36.3 GB
Apple Mac Studio (M1 Max, 2022)Apple	AA	8.9 tok/s	36.3 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	11.3 tok/s	36.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	13.6 tok/s	36.3 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~5.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Qwen3-235B-A22B on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~5.7 tok/s · 150W	$0.881
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 36 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture & Technical Details

MoE Efficiency and VRAM Impact

Context and Training

Context Length: 131,072 tokens. This enables massive RAG (Retrieval-Augmented Generation) workloads, allowing you to feed entire codebases or multiple technical manuals into the prompt.
Training Data: 36T tokens. This is a significant increase over previous generations, resulting in a model with deep world knowledge and a 2024 training cutoff.
License: Apache 2.0. This makes the model highly attractive for commercial local deployment without the restrictive terms found in some other frontier weights.

Capabilities & Use Cases

Advanced Reasoning and Math

Qwen3-235B-A22B for Coding

In software development, the model provides high-tier performance in:

Refactoring: Handling large-scale logic shifts across multiple files thanks to its 128k context window.
Function-Calling: Precise JSON output and tool-use capabilities for local agentic workflows.
Multilingual Programming: Strong support for obscure languages and specialized documentation, owing to its 119-language training base.

Multilingual and Creative Tasks

Running Qwen3-235B-A22B Locally

Running a model of this scale is a hardware-intensive endeavor. You cannot run this on a single consumer GPU without aggressive quantization that may degrade performance.

Qwen3-235B-A22B VRAM Requirements

FP16 (Unquantized): ~470GB VRAM. Requires an enterprise cluster (e.g., 8x A100 80GB).
Q8_0 (8-bit): ~245GB VRAM. Requires 4x A6000 or 4x A100.
Q4_K_M (4-bit - Recommended): ~140GB - 150GB VRAM. This is the best quantization for Qwen3-235B-A22B, offering the best balance between perplexity loss and hardware accessibility.
Q2_K (2-bit): ~85GB - 95GB VRAM. Can fit on 4x RTX 4090 (24GB each) or a Mac Studio with 128GB+ Unified Memory.

Recommended Hardware Requirements

Prosumer PC: A multi-GPU setup is mandatory. A 4x RTX 4090 (96GB VRAM) build can run the model at Q2_K or Q3_K quantizations.
Mac Studio/Mac Pro: An M2/M3/M4 Ultra with 192GB of Unified Memory is arguably the most stable way to run this model locally with higher-bit quantizations (Q4 or Q5).
Enterprise Workstation: Systems equipped with NVIDIA A6000 (48GB) or A100 (80GB) cards.

Expected Performance (Tokens Per Second)

Quick Start with Ollama

The fastest how to run 235B model on consumer GPU setups is via Ollama. If you have the required VRAM, you can pull the model directly:

ollama run qwen3:235b

Ollama will automatically handle the MoE offloading, but ensure your system has enough swap space if you are pushing the limits of your physical VRAM.