made by agents

Dense 27B natively multimodal model. All parameters active per forward pass, giving highest per-token reasoning density in the Qwen3.5 series. Ties GPT-5 mini on SWE-bench Verified (72.4).
Copy and paste this command to start running the model locally.
ollama run qwen3.5:27bAccess model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 67.1 GB | Low | Aggressive quantization — smallest size, noticeable quality loss |
| Q4_K_MRecommended | 72.8 GB | Good | Best balance of size and quality for most use-cases |
| Q5_K_M | 75.5 GB | Very Good | Slightly better quality than Q4 with moderate size increase |
| Q6_K | 78.7 GB | Excellent | Near-lossless quality with manageable size |
| Q8_0 | 85.5 GB | Near Perfect | Virtually indistinguishable from full precision |
| FP16 | 111.1 GB | Full | Full 16-bit floating point — maximum quality, largest size |
See which devices can run this model and at what quality level.
| SS | 40.9 tok/s | 72.8 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 53.1 tok/s | 72.8 GB | |
| SS | 58.6 tok/s | 72.8 GB | ||
NVIDIA B200 GPUNVIDIA | SS | 88.5 tok/s | 72.8 GB | |
| SS | 66.4 tok/s | 72.8 GB | ||
| SS | 88.5 tok/s | 72.8 GB | ||
Google Cloud TPU v5pGoogle | SS | 30.6 tok/s | 72.8 GB | |
| AA | 27.1 tok/s | 72.8 GB | ||
NVIDIA H100 SXM5 80GBNVIDIA | AA | 37.1 tok/s | 72.8 GB | |
| BB | 8.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.8 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 6.0 tok/s | 72.8 GB | ||
| BB | 8.8 tok/s | 72.8 GB | ||
NVIDIA A100 SXM4 80GBNVIDIA | BB | 22.6 tok/s | 72.8 GB | |
| BB | 4.4 tok/s | 72.8 GB | ||
| BB | 3.0 tok/s | 72.8 GB | ||
| BB | 9.1 tok/s | 72.8 GB | ||
| BB | 9.1 tok/s | 72.8 GB | ||
| FF | 3.2 tok/s | 72.8 GB | ||
| FF | 4.8 tok/s | 72.8 GB |
Qwen3.5-27B is Alibaba Cloud’s 2025 flagship mid-sized model, engineered to maximize the "reasoning density" possible on high-end consumer hardware. Unlike the trend toward Mixture-of-Experts (MoE) architectures, Qwen3.5-27B utilizes a dense architecture where all 27 billion parameters are active during every forward pass. This design choice prioritizes raw intelligence and instruction-following precision over the lower inference costs of MoE models, positioning it as a direct competitor to much larger models.
In the local AI ecosystem, this model occupies the "sweet spot" for practitioners: it is small enough to fit on a single high-end consumer GPU when quantized, yet powerful enough to tie GPT-5 mini on the SWE-bench Verified benchmark with a score of 72.4. For developers looking to run Qwen3.5-27B locally, it offers a natively multimodal experience, handling text, code, and vision tasks within a massive 262,144-token context window.
The local AI model 27B parameters 2025 standard is defined by Qwen3.5’s dense transformer architecture. While MoE models (like Mixtral) achieve speed by only activating a fraction of their parameters, Qwen3.5-27B's dense nature ensures that every token generated benefits from the full 27B parameter weight. This results in higher per-token intelligence, particularly in edge cases and complex logic where MoE routing can sometimes falter.
The model features a native context length of 262,144 tokens. This allows for the ingestion of entire codebases, long technical manuals, or hours of transcript data without the "lost in the middle" phenomena common in smaller context models. Because it is natively multimodal, the vision capabilities are integrated into the same architecture, allowing for interleaved image and text processing without relying on a separate vision encoder-adapter setup.
Qwen3.5-27B performance shines in high-stakes reasoning tasks that typically require a 70B+ parameter model. Its 2025 training cutoff ensures it is familiar with the latest frameworks and libraries, making it a premier choice for technical workflows.
The model is a top-tier performer for Qwen3.5-27B for coding tasks. With a 72.4 score on SWE-bench Verified, it can resolve real-world GitHub issues, perform complex refactoring, and generate unit tests across multiple files. The 262k context window is particularly useful here, as it allows the model to "see" the entire project structure during a debugging session.
As a multimodal model, Qwen3.5-27B excels at:
Qwen has historically led in multilingual benchmarks. The 3.5-27B variant continues this, offering GPT-4 level performance in languages across Asia and Europe. Its mathematical reasoning is robust, handling calculus, symbolic logic, and competitive programming problems with high accuracy.
To run Qwen3.5-27B locally, hardware selection is dictated by VRAM. Because this is a dense 27B model, the memory footprint is larger than an 8B model but significantly more manageable than a 70B model.
VRAM usage depends entirely on your choice of quantization. To calculate Qwen3.5-27B hardware requirements, use this breakdown:
| Quantization | VRAM (Weights Only) | Recommended Total VRAM (incl. Context) |
| :--- | :--- | :--- |
| BF16 (Unquantized) | ~54 GB | 64 GB+ (Requires 2x RTX 3090/4090 or A6000) |
| Q8_0 (8-bit) | ~29 GB | 32 GB+ (Requires 2x RTX 3090/4090) |
| Q4_K_M (4-bit) | ~17 GB | 24 GB (Fits on a single RTX 3090/4090) |
| IQ3_M (3-bit) | ~12 GB | 16 GB (Fits on RTX 4080 or 16GB Mac) |
For most practitioners, Q4_K_M (4-bit) is the best quantization for Qwen3.5-27B. At 4-bit, the model retains over 99% of its FP16 intelligence while fitting comfortably into the 24GB VRAM buffer of an RTX 3090 or 4090. This leaves roughly 6GB of VRAM for KV cache (context), allowing for a respectable context window of 16k–32k tokens before needing to offload to system RAM.
The fastest way to deploy this model is via Ollama. Once installed, run:
ollama run qwen3.5:27b
This will automatically pull a 4-bit quantized version optimized for your available hardware.
When evaluating Qwen3.5-27B, the most realistic alternatives are Gemma 2 27B and Llama 3.1 8B/70B.
Google’s Gemma 2 27B is the closest architectural rival.
Qwen3.5-27B is currently the most capable model available for users restricted to a 24GB VRAM envelope who refuse to compromise on coding performance or multimodal capabilities. Its Qwen3.5-27B reasoning benchmark scores prove that parameter efficiency, when combined with a dense architecture and 2025-era training data, can rival the previous generation's 70B+ models.