Alibaba Cloud

Qwen3.6-27B

A fully open-source, 27-billion parameter dense multimodal model delivering flagship-level agentic coding that surpasses previous 397B parameter architectures, featuring native million-token context extension.

27B paramsDense262K ctxMultimodal

View on Hugging Face

Run with Ollama Source Code Official Page

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters27B

Active Params27B

ArchitectureDense

Context Length262K tokens

ModalityMultimodal

ProviderAlibaba Cloud

Download Size55.6 GB

Community

Monthly Downloads162.3K

Likes753

Last UpdatedYesterday

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3.6:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

GPQA

87.8

MMLU-PRO

86.2

SWE-Verified

77.2

HLE

24.0

AIME 2026

94.1

Terminal Bench

59.3

SWE-Pro

53.5

HMMT 2026

84.3

Overall Score

57.4BB

Benchmark40%

70.8

Popularity25%

36.1

Efficiency20%

33.3

Versatility15%

89.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	67.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	72.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	75.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	78.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	111.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	40.9 tok/s	72.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	53.1 tok/s	72.8 GB
AMD Instinct MI300XAMD	SS	58.6 tok/s	72.8 GB
NVIDIA B200 GPUNVIDIA	SS	88.5 tok/s	72.8 GB
AMD Instinct MI325XAMD	SS	66.4 tok/s	72.8 GB
AMD Instinct MI355XAMD	SS	88.5 tok/s	72.8 GB
Google Cloud TPU v5pGoogle	SS	30.6 tok/s	72.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	78.5 tok/s	72.8 GB
Dell Pro Max with GB300Dell	SS	78.5 tok/s	72.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	78.5 tok/s	72.8 GB
HP ZGX Fury AI StationHP	SS	78.5 tok/s	72.8 GB
MSI XpertStation WS300MSI	SS	78.5 tok/s	72.8 GB
SuperMicro Super AI StationSuperMicro	SS	78.5 tok/s	72.8 GB
Intel Gaudi 2 AI AcceleratorIntel	AA	27.1 tok/s	72.8 GB
NVIDIA H100 SXM5 80GBNVIDIA	AA	37.1 tok/s	72.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	BB	8.8 tok/s	72.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
Apple M4 Max (40-core GPU)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	8.8 tok/s	72.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	BB	22.6 tok/s	72.8 GB

Rows per page

Page 1 of 4

About This Model

Qwen3.6-27B is a dense, multimodal model from Alibaba Cloud that resets the performance ceiling for the 20B–40B parameter class. While many competitors have pivoted to Mixture-of-Experts (MoE) architectures to maintain speed, Qwen3.6-27B remains a dense model, delivering a level of reasoning and "agentic" consistency that typically requires models ten times its size. It is explicitly designed for repository-level coding, complex tool-use, and long-context vision tasks, making it a primary candidate for developers who need flagship-grade performance on local workstations.

Released under the Apache 2.0 license, this model is a direct response to the demand for high-utility, open-weights hardware targets. It bridges the gap between mid-range consumer hardware and enterprise-grade inference, providing a 262,144-token context window that enables local analysis of entire codebases or massive document sets without relying on cloud APIs.

Architecture and Technical Details

The Qwen3.6-27B architecture is a sophisticated hybrid. While it is a dense model with 27 billion parameters, it departs from standard Transformer designs by incorporating a "Gated DeltaNet" linear attention mechanism alongside traditional self-attention. This hybrid approach is engineered to solve the quadratic scaling issues of standard attention, allowing for its massive 262k context length while maintaining high throughput.

Key technical specifications for local deployment include:

Parameters: 27B (Dense)
Architecture: Hybrid Gated DeltaNet + Gated Attention
Context Window: 262,144 tokens
Layers: 64
Hidden Dimension: 5120
Vocabulary Size: 248,320 tokens
Quantization Support: Native FP8 support with block-wise quantization (block size 128) and standard GGUF/EXL2 compatibility.

The model also introduces "Thinking Preservation," a mechanism that allows the model to retain internal reasoning chains across multi-turn conversations. For developers, this means the model doesn't "forget" the logic it established in previous steps of a complex debugging or refactoring task, significantly reducing the needle-in-a-haystack errors common in long-form agentic workflows.

Capabilities and Use Cases

Qwen3.6-27B is not a general-purpose "chat" model in the traditional sense; it is a functional tool optimized for high-logic workloads. Its multimodal nature allows it to process interleaved text and image data, but its primary strengths lie in its "agentic" capabilities.

Agentic Coding and Repository Reasoning

The model is specifically tuned for frontend workflows and repository-level reasoning. Unlike smaller coding models that focus on single-function completion, Qwen3.6-27B can navigate complex file structures and understand dependencies across a project. This makes it ideal for running local coding assistants like aider or Claude Code (via local providers), where the model must act as an agent to find, diagnose, and fix bugs across multiple files.

Vision and Document Intelligence

With its integrated vision encoder, the model excels at OCR, architectural diagram analysis, and UI/UX auditing. It can ingest a screenshot of a frontend bug and suggest the specific CSS or React code to fix it. Its ability to handle high-resolution images within a massive context window makes it a powerful asset for RAG (Retrieval-Augmented Generation) pipelines involving technical manuals and schematics.

Multi-Step Tool Use and Function Calling

Qwen3.6-27B outperforms previous 397B parameter MoE models on several agentic benchmarks. It is highly reliable at following complex JSON schemas and executing multi-step function calls. If you are building a local autonomous agent to manage file systems or interact with APIs, this model provides the necessary instruction-following stability to prevent loop errors.

Running Qwen3.6-27B Locally

Running a 27B parameter model requires careful consideration of VRAM and quantization. Because it is a dense model, you must fit all 27B parameters into memory to achieve acceptable speeds; unlike MoE models, there are no "inactive" parameters during inference.

Qwen3.6-27B VRAM Requirements

To calculate your hardware needs, use these general targets for the 27B model:

BF16 (Unquantized): ~58GB VRAM (Requires 2x RTX 3090/4090 or an A6000/A100).
Q8_0 Quantization: ~30GB VRAM (Requires 2x RTX 3090/4090 or Mac Studio).
Q4_K_M Quantization (Recommended): ~18GB - 20GB VRAM. This is the "sweet spot" for most users, fitting comfortably on a single RTX 3090/4090 24GB or an Apple M2/M3/M4 Max with at least 32GB of Unified Memory.
Q2_K (Minimum): ~10GB - 12GB VRAM. Possible on an RTX 3060/4070, but expect significant logic degradation.

Hardware Recommendations

For the best Qwen3.6-27B performance, prioritize memory bandwidth.

NVIDIA Users: A single RTX 4090 (24GB) is the gold standard for this model. Using 4-bit or 5-bit quantization (GGUF or EXL2), you can expect 40-60 tokens per second.
Mac Users: Any Apple Silicon chip with 32GB+ of Unified Memory will run this model efficiently. An M4 Max will provide near-instantaneous responses for coding tasks.
Dual GPU: If running unquantized or at Q8, a dual RTX 3090 setup via NVLink or PCIe scaling is the most cost-effective path.

Quick Start with Ollama

The fastest way to run Qwen3.6-27B locally is via Ollama. Once installed, you can pull the model directly:

ollama run qwen3.6:27b

For coding-specific tasks, ensure your environment is configured to utilize the full 262k context window, as Ollama may default to a lower limit (e.g., 8k or 32k) depending on your system's available memory.

How It Compares

When evaluating Qwen3.6-27B, it is most often compared to Gemma 2 27B and Llama 3.1 70B.

Qwen3.6-27B vs. Gemma 2 27B: While Gemma 2 is a formidable logic model, Qwen3.6-27B generally leads in coding tasks and offers a significantly larger context window (262k vs 8k). Qwen's multimodal capabilities also give it a versatility edge that Gemma 2 lacks.
Qwen3.6-27B vs. Llama 3.1 70B: Llama 3.1 70B is a much larger model that requires significantly more hardware (typically 2x 3090s even at 4-bit). Surprisingly, Qwen3.6-27B often matches or exceeds Llama 70B in specialized agentic coding benchmarks while being much easier to run on a single consumer GPU.
Qwen3.6-27B vs. Qwen3.5-397B (MoE): Despite having fewer total parameters, the 27B dense model often provides more "coherent" long-form reasoning. The 397B MoE version is faster for simple queries because it only activates a fraction of its weights, but for deep repository-level debugging, the 27B dense architecture is often more precise.

For practitioners looking for a "daily driver" model that fits on a single high-end GPU without sacrificing the ability to handle complex, multi-file coding projects, Qwen3.6-27B is currently the most efficient choice in the 20B-40B parameter range.

Related Models

Alibaba Cloud

Qwen 3.5 Omni

397BMoE

Alibaba Cloud

Qwen3.6 35B-A3B

35BMoE

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

Alibaba Cloud

Qwen3.6-27B

27B paramsDense262K ctxMultimodal

View on Hugging Face

Run with Ollama Source Code Official Page

Capabilities

Chat

Code Generation

Vision

Reasoning

Function Calling

Multilingual

Math

Instruction Following

Model Specifications

Parameters27B

Active Params27B

ArchitectureDense

Context Length262K tokens

ModalityMultimodal

ProviderAlibaba Cloud

Download Size55.6 GB

Community

Monthly Downloads162.3K

Likes753

Last UpdatedYesterday

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run qwen3.6:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

GPQA

87.8

MMLU-PRO

86.2

SWE-Verified

77.2

HLE

24.0

AIME 2026

94.1

Terminal Bench

59.3

SWE-Pro

53.5

HMMT 2026

84.3

Overall Score

57.4BB

Benchmark40%

70.8

Popularity25%

36.1

Efficiency20%

33.3

Versatility15%

89.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	67.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	72.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	75.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	78.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	85.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	111.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Intel Gaudi 3 AI AcceleratorIntel	SS	40.9 tok/s	72.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	53.1 tok/s	72.8 GB
AMD Instinct MI300XAMD	SS	58.6 tok/s	72.8 GB
NVIDIA B200 GPUNVIDIA	SS	88.5 tok/s	72.8 GB
AMD Instinct MI325XAMD	SS	66.4 tok/s	72.8 GB
AMD Instinct MI355XAMD	SS	88.5 tok/s	72.8 GB
Google Cloud TPU v5pGoogle	SS	30.6 tok/s	72.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	78.5 tok/s	72.8 GB
Dell Pro Max with GB300Dell	SS	78.5 tok/s	72.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	78.5 tok/s	72.8 GB
HP ZGX Fury AI StationHP	SS	78.5 tok/s	72.8 GB
MSI XpertStation WS300MSI	SS	78.5 tok/s	72.8 GB
SuperMicro Super AI StationSuperMicro	SS	78.5 tok/s	72.8 GB
Intel Gaudi 2 AI AcceleratorIntel	AA	27.1 tok/s	72.8 GB
NVIDIA H100 SXM5 80GBNVIDIA	AA	37.1 tok/s	72.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	BB	8.8 tok/s	72.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	6.8 tok/s	72.8 GB
Apple M4 Max (40-core GPU)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M4 Max, 2025)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 14-inch M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
MacBook Pro 16" M4 Max (2024)Apple	BB	6.0 tok/s	72.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	8.8 tok/s	72.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	BB	22.6 tok/s	72.8 GB

Rows per page

Page 1 of 4

About This Model

Architecture and Technical Details

Key technical specifications for local deployment include:

Parameters: 27B (Dense)
Architecture: Hybrid Gated DeltaNet + Gated Attention
Context Window: 262,144 tokens
Layers: 64
Hidden Dimension: 5120
Vocabulary Size: 248,320 tokens
Quantization Support: Native FP8 support with block-wise quantization (block size 128) and standard GGUF/EXL2 compatibility.

Capabilities and Use Cases

Agentic Coding and Repository Reasoning

Vision and Document Intelligence

Multi-Step Tool Use and Function Calling

Running Qwen3.6-27B Locally

Qwen3.6-27B VRAM Requirements

To calculate your hardware needs, use these general targets for the 27B model:

BF16 (Unquantized): ~58GB VRAM (Requires 2x RTX 3090/4090 or an A6000/A100).
Q8_0 Quantization: ~30GB VRAM (Requires 2x RTX 3090/4090 or Mac Studio).
Q4_K_M Quantization (Recommended): ~18GB - 20GB VRAM. This is the "sweet spot" for most users, fitting comfortably on a single RTX 3090/4090 24GB or an Apple M2/M3/M4 Max with at least 32GB of Unified Memory.
Q2_K (Minimum): ~10GB - 12GB VRAM. Possible on an RTX 3060/4070, but expect significant logic degradation.

Hardware Recommendations

For the best Qwen3.6-27B performance, prioritize memory bandwidth.

NVIDIA Users: A single RTX 4090 (24GB) is the gold standard for this model. Using 4-bit or 5-bit quantization (GGUF or EXL2), you can expect 40-60 tokens per second.
Mac Users: Any Apple Silicon chip with 32GB+ of Unified Memory will run this model efficiently. An M4 Max will provide near-instantaneous responses for coding tasks.
Dual GPU: If running unquantized or at Q8, a dual RTX 3090 setup via NVLink or PCIe scaling is the most cost-effective path.

Quick Start with Ollama

The fastest way to run Qwen3.6-27B locally is via Ollama. Once installed, you can pull the model directly:

ollama run qwen3.6:27b

How It Compares

When evaluating Qwen3.6-27B, it is most often compared to Gemma 2 27B and Llama 3.1 70B.

Qwen3.6-27B vs. Gemma 2 27B: While Gemma 2 is a formidable logic model, Qwen3.6-27B generally leads in coding tasks and offers a significantly larger context window (262k vs 8k). Qwen's multimodal capabilities also give it a versatility edge that Gemma 2 lacks.
Qwen3.6-27B vs. Llama 3.1 70B: Llama 3.1 70B is a much larger model that requires significantly more hardware (typically 2x 3090s even at 4-bit). Surprisingly, Qwen3.6-27B often matches or exceeds Llama 70B in specialized agentic coding benchmarks while being much easier to run on a single consumer GPU.
Qwen3.6-27B vs. Qwen3.5-397B (MoE): Despite having fewer total parameters, the 27B dense model often provides more "coherent" long-form reasoning. The 397B MoE version is faster for simple queries because it only activates a fraction of its weights, but for deep repository-level debugging, the 27B dense architecture is often more precise.

Related Models

Alibaba Cloud

Qwen 3.5 Omni

397BMoE

Alibaba Cloud

Qwen3.6 35B-A3B

35BMoE

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.