Alibaba

Qwen3.5-27B

Dense 27B natively multimodal model. All parameters active per forward pass, giving highest per-token reasoning density in the Qwen3.5 series. Ties GPT-5 mini on SWE-bench Verified (72.4).

27B paramsDense262K ctxMultimodal

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Strongest at competition math (AIME 2026) in its size class

A solid 27B-parameter dense language model from Alibaba. Pulls ahead on competition math (AIME 2026) (91/100), so reach for it when that's the dimension that matters.

Run this onGoogle Cloud TPU v5pCheapest card in our directory with comfortable headroom (95 GB) for this model at Q4 (~72.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters27B

ArchitectureDense

Context Length262K tokens

ModalityMultimodal

Training Cutoff2025

ProviderAlibaba

Download Size55.6 GB

Community

Monthly Downloads3.3M

Likes975

Last Updated28 days ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3.5:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

85.5

72.4

24.3

90.8

41.6

81.1

86.3

Overall Score

64.7BB

Benchmark40%

68.8

Popularity25%

67.5

Efficiency20%

34.4

Versatility15%

89.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	67.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	72.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	75.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	78.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	111.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	40.9 tok/s	72.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	53.1 tok/s	72.8 GB
AMD Instinct MI300XAMD	SS	58.6 tok/s	72.8 GB
Google TPU v7 (Ironwood)Google	SS	81.6 tok/s	72.8 GB
NVIDIA B200 GPUNVIDIA	SS	88.5 tok/s	72.8 GB
AMD Instinct MI325XAMD	SS	66.4 tok/s	72.8 GB
AMD Instinct MI355XAMD	SS	88.5 tok/s	72.8 GB
Google Cloud TPU v5pGoogle	SS	30.6 tok/s	72.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	78.5 tok/s	72.8 GB
Dell Pro Max with GB300Dell	SS	78.5 tok/s	72.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	78.5 tok/s	72.8 GB
HP ZGX Fury AI StationHP	SS	78.5 tok/s	72.8 GB
MSI XpertStation WS300MSI	SS	78.5 tok/s	72.8 GB
SuperMicro Super AI StationSuperMicro	SS	78.5 tok/s	72.8 GB
Intel Gaudi 2 AI AcceleratorIntel	AA	27.1 tok/s	72.8 GB
NVIDIA H100 SXM5 80GBNVIDIA	AA	37.1 tok/s	72.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	BB	8.8 tok/s	72.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
Apple M4 Max (40-core GPU)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	5.7 tok/s	72.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on NVIDIA A100 SXM4 80GB (~23 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Qwen3.5-27B on NVIDIA A100 SXM4 80GB · ~23 tok/s · 400W	$0.591
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 73 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA A100 80GB PCIeVast.ai · Spot · 80 GB VRAM	$1.02

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Qwen3.5-27B is Alibaba Cloud’s 2025 flagship mid-sized model, engineered to maximize the "reasoning density" possible on high-end consumer hardware. Unlike the trend toward Mixture-of-Experts (MoE) architectures, Qwen3.5-27B utilizes a dense architecture where all 27 billion parameters are active during every forward pass. This design choice prioritizes raw intelligence and instruction-following precision over the lower inference costs of MoE models, positioning it as a direct competitor to much larger models.

In the local AI ecosystem, this model occupies the "sweet spot" for practitioners: it is small enough to fit on a single high-end consumer GPU when quantized, yet powerful enough to tie GPT-5 mini on the SWE-bench Verified benchmark with a score of 72.4. For developers looking to run Qwen3.5-27B locally, it offers a natively multimodal experience, handling text, code, and vision tasks within a massive 262,144-token context window.

Architecture & Technical Details

The local AI model 27B parameters 2025 standard is defined by Qwen3.5’s dense transformer architecture. While MoE models (like Mixtral) achieve speed by only activating a fraction of their parameters, Qwen3.5-27B's dense nature ensures that every token generated benefits from the full 27B parameter weight. This results in higher per-token intelligence, particularly in edge cases and complex logic where MoE routing can sometimes falter.

Context and Multimodality

The model features a native context length of 262,144 tokens. This allows for the ingestion of entire codebases, long technical manuals, or hours of transcript data without the "lost in the middle" phenomena common in smaller context models. Because it is natively multimodal, the vision capabilities are integrated into the same architecture, allowing for interleaved image and text processing without relying on a separate vision encoder-adapter setup.

Efficiency Characteristics

Parameters: 27B (Dense)
Training Cutoff: 2025
License: Apache 2.0 (Permissive for commercial and local use)
Tokenization: Optimized for multilingual support, reducing the token count for non-English languages compared to Llama-based models.

Capabilities & Use Cases

Qwen3.5-27B performance shines in high-stakes reasoning tasks that typically require a 70B+ parameter model. Its 2025 training cutoff ensures it is familiar with the latest frameworks and libraries, making it a premier choice for technical workflows.

Advanced Coding and SWE-bench

The model is a top-tier performer for Qwen3.5-27B for coding tasks. With a 72.4 score on SWE-bench Verified, it can resolve real-world GitHub issues, perform complex refactoring, and generate unit tests across multiple files. The 262k context window is particularly useful here, as it allows the model to "see" the entire project structure during a debugging session.

Native Vision and Reasoning

As a multimodal model, Qwen3.5-27B excels at:

Optical Character Recognition (OCR): Extracting structured data from complex forms, blueprints, or handwritten notes.
Visual Reasoning: Analyzing architectural diagrams or flowcharts to explain logic or identify bottlenecks.
Function-Calling: Operating as an agent that can look at a UI screenshot and generate the necessary API calls or code to interact with it.

Multilingual Logic and Math

Qwen has historically led in multilingual benchmarks. The 3.5-27B variant continues this, offering GPT-4 level performance in languages across Asia and Europe. Its mathematical reasoning is robust, handling calculus, symbolic logic, and competitive programming problems with high accuracy.

Running Qwen3.5-27B Locally

To run Qwen3.5-27B locally, hardware selection is dictated by VRAM. Because this is a dense 27B model, the memory footprint is larger than an 8B model but significantly more manageable than a 70B model.

Qwen3.5-27B VRAM Requirements

VRAM usage depends entirely on your choice of quantization. To calculate Qwen3.5-27B hardware requirements, use this breakdown:

Quantization	VRAM (Weights Only)	Recommended Total VRAM (incl. Context)
BF16 (Unquantized)	~54 GB	64 GB+ (Requires 2x RTX 3090/4090 or A6000)
Q8_0 (8-bit)	~29 GB	32 GB+ (Requires 2x RTX 3090/4090)
Q4_K_M (4-bit)	~17 GB	24 GB (Fits on a single RTX 3090/4090)
IQ3_M (3-bit)	~12 GB	16 GB (Fits on RTX 4080 or 16GB Mac)

Best Quantization for Qwen3.5-27B

For most practitioners, Q4_K_M (4-bit) is the best quantization for Qwen3.5-27B. At 4-bit, the model retains over 99% of its FP16 intelligence while fitting comfortably into the 24GB VRAM buffer of an RTX 3090 or 4090. This leaves roughly 6GB of VRAM for KV cache (context), allowing for a respectable context window of 16k–32k tokens before needing to offload to system RAM.

Recommended Hardware

Best GPU for Qwen3.5-27B: NVIDIA RTX 4090 (24GB). This allows for 4-bit quantization with high throughput. For those needing the full 262k context, a multi-GPU setup (2x 3090/4090) or a Mac Studio with 64GB+ Unified Memory is required.
Apple Silicon: An M2/M3/M4 Max with at least 32GB of Unified Memory provides an excellent experience. Mac hardware is particularly well-suited for this model due to the high memory bandwidth required by dense architectures.
Performance: Expect Qwen3.5-27B tokens per second in the range of 40–60 t/s on an RTX 4090 using GGUF/llama.cpp at 4-bit quantization. On an M3 Max, expect 20–30 t/s.

Quick Start with Ollama

The fastest way to deploy this model is via Ollama. Once installed, run:

ollama run qwen3.5:27b

This will automatically pull a 4-bit quantized version optimized for your available hardware.

How It Compares

When evaluating Qwen3.5-27B, the most realistic alternatives are Gemma 2 27B and Llama 3.1 8B/70B.

Qwen3.5-27B vs Gemma 2 27B

Google’s Gemma 2 27B is the closest architectural rival.

Context: Qwen3.5-27B wins significantly with 262k context vs. Gemma’s 8k.
Modality: Qwen is natively multimodal (vision); Gemma 2 27B is text-only.
Performance: Qwen3.5-27B generally outperforms Gemma 2 in coding and multilingual tasks, while Gemma 2 remains highly competitive in creative writing and general chat.

Qwen3.5-27B vs Llama 3.1 70B

Hardware: Llama 3.1 70B requires at least 40GB of VRAM for a usable 4-bit quantization, mandating a dual-GPU or professional-grade setup. Qwen3.5-27B provides roughly 85–90% of the 70B model's reasoning capabilities while fitting on a single consumer card.
Speed: Because it has fewer total parameters, Qwen3.5-27B offers significantly higher tokens per second on equivalent hardware compared to Llama 3.1 70B.

Qwen3.5-27B is currently the most capable model available for users restricted to a 24GB VRAM envelope who refuse to compromise on coding performance or multimodal capabilities. Its Qwen3.5-27B reasoning benchmark scores prove that parameter efficiency, when combined with a dense architecture and 2025-era training data, can rival the previous generation's 70B+ models.

Related Models

Alibaba

Explore the Provider

See all Alibaba models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Alibaba model we track.

Open Alibaba

Explore the Family

See every Qwen release

The full Qwen family leaderboard with sizes, benchmark scores, and a release timeline.

Open Qwen

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Alibaba

Qwen3.5-27B

Dense 27B natively multimodal model. All parameters active per forward pass, giving highest per-token reasoning density in the Qwen3.5 series. Ties GPT-5 mini on SWE-bench Verified (72.4).

27B paramsDense262K ctxMultimodal

View on Hugging Face

Run with Ollama Source Code Official Page

Our Take

Best for: Strongest at competition math (AIME 2026) in its size class

A solid 27B-parameter dense language model from Alibaba. Pulls ahead on competition math (AIME 2026) (91/100), so reach for it when that's the dimension that matters.

Run this onGoogle Cloud TPU v5pCheapest card in our directory with comfortable headroom (95 GB) for this model at Q4 (~72.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters27B

ArchitectureDense

Context Length262K tokens

ModalityMultimodal

Training Cutoff2025

ProviderAlibaba

Download Size55.6 GB

Community

Monthly Downloads3.3M

Likes975

Last Updated28 days ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3.5:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

85.5

72.4

24.3

90.8

41.6

81.1

86.3

Overall Score

64.7BB

Benchmark40%

68.8

Popularity25%

67.5

Efficiency20%

34.4

Versatility15%

89.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	67.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	72.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	75.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	78.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	111.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	40.9 tok/s	72.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	53.1 tok/s	72.8 GB
AMD Instinct MI300XAMD	SS	58.6 tok/s	72.8 GB
Google TPU v7 (Ironwood)Google	SS	81.6 tok/s	72.8 GB
NVIDIA B200 GPUNVIDIA	SS	88.5 tok/s	72.8 GB
AMD Instinct MI325XAMD	SS	66.4 tok/s	72.8 GB
AMD Instinct MI355XAMD	SS	88.5 tok/s	72.8 GB
Google Cloud TPU v5pGoogle	SS	30.6 tok/s	72.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	78.5 tok/s	72.8 GB
Dell Pro Max with GB300Dell	SS	78.5 tok/s	72.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	78.5 tok/s	72.8 GB
HP ZGX Fury AI StationHP	SS	78.5 tok/s	72.8 GB
MSI XpertStation WS300MSI	SS	78.5 tok/s	72.8 GB
SuperMicro Super AI StationSuperMicro	SS	78.5 tok/s	72.8 GB
Intel Gaudi 2 AI AcceleratorIntel	AA	27.1 tok/s	72.8 GB
NVIDIA H100 SXM5 80GBNVIDIA	AA	37.1 tok/s	72.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	BB	8.8 tok/s	72.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
Apple M4 Max (40-core GPU)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	5.7 tok/s	72.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on NVIDIA A100 SXM4 80GB (~23 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Qwen3.5-27B on NVIDIA A100 SXM4 80GB · ~23 tok/s · 400W	$0.591
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 73 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA A100 80GB PCIeVast.ai · Spot · 80 GB VRAM	$1.02

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture & Technical Details

Context and Multimodality

Efficiency Characteristics

Parameters: 27B (Dense)
Training Cutoff: 2025
License: Apache 2.0 (Permissive for commercial and local use)
Tokenization: Optimized for multilingual support, reducing the token count for non-English languages compared to Llama-based models.

Capabilities & Use Cases

Advanced Coding and SWE-bench

Native Vision and Reasoning

As a multimodal model, Qwen3.5-27B excels at:

Optical Character Recognition (OCR): Extracting structured data from complex forms, blueprints, or handwritten notes.
Visual Reasoning: Analyzing architectural diagrams or flowcharts to explain logic or identify bottlenecks.
Function-Calling: Operating as an agent that can look at a UI screenshot and generate the necessary API calls or code to interact with it.

Multilingual Logic and Math

Running Qwen3.5-27B Locally

Qwen3.5-27B VRAM Requirements

VRAM usage depends entirely on your choice of quantization. To calculate Qwen3.5-27B hardware requirements, use this breakdown:

Quantization	VRAM (Weights Only)	Recommended Total VRAM (incl. Context)
BF16 (Unquantized)	~54 GB	64 GB+ (Requires 2x RTX 3090/4090 or A6000)
Q8_0 (8-bit)	~29 GB	32 GB+ (Requires 2x RTX 3090/4090)
Q4_K_M (4-bit)	~17 GB	24 GB (Fits on a single RTX 3090/4090)
IQ3_M (3-bit)	~12 GB	16 GB (Fits on RTX 4080 or 16GB Mac)

Best Quantization for Qwen3.5-27B

Recommended Hardware

Best GPU for Qwen3.5-27B: NVIDIA RTX 4090 (24GB). This allows for 4-bit quantization with high throughput. For those needing the full 262k context, a multi-GPU setup (2x 3090/4090) or a Mac Studio with 64GB+ Unified Memory is required.
Apple Silicon: An M2/M3/M4 Max with at least 32GB of Unified Memory provides an excellent experience. Mac hardware is particularly well-suited for this model due to the high memory bandwidth required by dense architectures.
Performance: Expect Qwen3.5-27B tokens per second in the range of 40–60 t/s on an RTX 4090 using GGUF/llama.cpp at 4-bit quantization. On an M3 Max, expect 20–30 t/s.

Quick Start with Ollama

The fastest way to deploy this model is via Ollama. Once installed, run:

ollama run qwen3.5:27b

This will automatically pull a 4-bit quantized version optimized for your available hardware.

How It Compares

When evaluating Qwen3.5-27B, the most realistic alternatives are Gemma 2 27B and Llama 3.1 8B/70B.

Qwen3.5-27B vs Gemma 2 27B

Google’s Gemma 2 27B is the closest architectural rival.

Context: Qwen3.5-27B wins significantly with 262k context vs. Gemma’s 8k.
Modality: Qwen is natively multimodal (vision); Gemma 2 27B is text-only.
Performance: Qwen3.5-27B generally outperforms Gemma 2 in coding and multilingual tasks, while Gemma 2 remains highly competitive in creative writing and general chat.

Qwen3.5-27B vs Llama 3.1 70B

Hardware: Llama 3.1 70B requires at least 40GB of VRAM for a usable 4-bit quantization, mandating a dual-GPU or professional-grade setup. Qwen3.5-27B provides roughly 85–90% of the 70B model's reasoning capabilities while fitting on a single consumer card.
Speed: Because it has fewer total parameters, Qwen3.5-27B offers significantly higher tokens per second on equivalent hardware compared to Llama 3.1 70B.