
A fully open-source, 27-billion parameter dense multimodal model delivering flagship-level agentic coding that surpasses previous 397B parameter architectures, featuring native million-token context extension.
Copy and paste this command to start running the model locally.
ollama run qwen3.6:27bAccess model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 67.1 GB | Low | |
| Q4_K_MRecommended | 72.8 GB | Good | |
| Q5_K_M | 75.5 GB | Very Good | |
| Q6_K | 78.7 GB | Excellent | |
| Q8_0 | 85.5 GB | Near Perfect | |
| FP16 | 111.1 GB | Full |
See which devices can run this model and at what quality level.
| SS | 40.9 tok/s | 72.8 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 53.1 tok/s | 72.8 GB | |
| SS | 58.6 tok/s | 72.8 GB | ||
NVIDIA B200 GPUNVIDIA | SS | 88.5 tok/s | 72.8 GB | |
| SS | 66.4 tok/s | 72.8 GB | ||
| SS | 88.5 tok/s | 72.8 GB | ||
Google Cloud TPU v5pGoogle | SS | 30.6 tok/s | 72.8 GB | |
| SS | 78.5 tok/s | 72.8 GB | ||
| SS | 78.5 tok/s | 72.8 GB | ||
Gigabyte W775-V10-L01Gigabyte | SS | 78.5 tok/s | 72.8 GB | |
| SS | 78.5 tok/s | 72.8 GB | ||
| SS | 78.5 tok/s | 72.8 GB | ||
SuperMicro Super AI StationSuperMicro | SS | 78.5 tok/s | 72.8 GB | |
| AA | 27.1 tok/s | 72.8 GB | ||
NVIDIA H100 SXM5 80GBNVIDIA | AA | 37.1 tok/s | 72.8 GB | |
| BB | 8.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 8.8 tok/s | 72.8 GB | ||
NVIDIA A100 SXM4 80GBNVIDIA | BB | 22.6 tok/s | 72.8 GB |
Qwen3.6-27B is a dense, multimodal model from Alibaba Cloud that resets the performance ceiling for the 20B–40B parameter class. While many competitors have pivoted to Mixture-of-Experts (MoE) architectures to maintain speed, Qwen3.6-27B remains a dense model, delivering a level of reasoning and "agentic" consistency that typically requires models ten times its size. It is explicitly designed for repository-level coding, complex tool-use, and long-context vision tasks, making it a primary candidate for developers who need flagship-grade performance on local workstations.
Released under the Apache 2.0 license, this model is a direct response to the demand for high-utility, open-weights hardware targets. It bridges the gap between mid-range consumer hardware and enterprise-grade inference, providing a 262,144-token context window that enables local analysis of entire codebases or massive document sets without relying on cloud APIs.
The Qwen3.6-27B architecture is a sophisticated hybrid. While it is a dense model with 27 billion parameters, it departs from standard Transformer designs by incorporating a "Gated DeltaNet" linear attention mechanism alongside traditional self-attention. This hybrid approach is engineered to solve the quadratic scaling issues of standard attention, allowing for its massive 262k context length while maintaining high throughput.
Key technical specifications for local deployment include:
The model also introduces "Thinking Preservation," a mechanism that allows the model to retain internal reasoning chains across multi-turn conversations. For developers, this means the model doesn't "forget" the logic it established in previous steps of a complex debugging or refactoring task, significantly reducing the needle-in-a-haystack errors common in long-form agentic workflows.
Qwen3.6-27B is not a general-purpose "chat" model in the traditional sense; it is a functional tool optimized for high-logic workloads. Its multimodal nature allows it to process interleaved text and image data, but its primary strengths lie in its "agentic" capabilities.
The model is specifically tuned for frontend workflows and repository-level reasoning. Unlike smaller coding models that focus on single-function completion, Qwen3.6-27B can navigate complex file structures and understand dependencies across a project. This makes it ideal for running local coding assistants like aider or Claude Code (via local providers), where the model must act as an agent to find, diagnose, and fix bugs across multiple files.
With its integrated vision encoder, the model excels at OCR, architectural diagram analysis, and UI/UX auditing. It can ingest a screenshot of a frontend bug and suggest the specific CSS or React code to fix it. Its ability to handle high-resolution images within a massive context window makes it a powerful asset for RAG (Retrieval-Augmented Generation) pipelines involving technical manuals and schematics.
Qwen3.6-27B outperforms previous 397B parameter MoE models on several agentic benchmarks. It is highly reliable at following complex JSON schemas and executing multi-step function calls. If you are building a local autonomous agent to manage file systems or interact with APIs, this model provides the necessary instruction-following stability to prevent loop errors.
Running a 27B parameter model requires careful consideration of VRAM and quantization. Because it is a dense model, you must fit all 27B parameters into memory to achieve acceptable speeds; unlike MoE models, there are no "inactive" parameters during inference.
To calculate your hardware needs, use these general targets for the 27B model:
For the best Qwen3.6-27B performance, prioritize memory bandwidth.
4-bit or 5-bit quantization (GGUF or EXL2), you can expect 40-60 tokens per second.The fastest way to run Qwen3.6-27B locally is via Ollama. Once installed, you can pull the model directly:
ollama run qwen3.6:27b
For coding-specific tasks, ensure your environment is configured to utilize the full 262k context window, as Ollama may default to a lower limit (e.g., 8k or 32k) depending on your system's available memory.
When evaluating Qwen3.6-27B, it is most often compared to Gemma 2 27B and Llama 3.1 70B.
For practitioners looking for a "daily driver" model that fits on a single high-end GPU without sacrificing the ability to handle complex, multi-file coding projects, Qwen3.6-27B is currently the most efficient choice in the 20B-40B parameter range.