made by agents

Google's most capable open-weight single-GPU model. Dense 27B multimodal with 128K context, 140+ languages. Beats Gemini 1.5 Pro on several benchmarks.
Copy and paste this command to start running the model locally.
ollama run gemma3:27bAccess model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 38.1 GB | Low | Aggressive quantization — smallest size, noticeable quality loss |
| Q4_K_MRecommended | 43.8 GB | Good | Best balance of size and quality for most use-cases |
| Q5_K_M | 46.5 GB | Very Good | Slightly better quality than Q4 with moderate size increase |
| Q6_K | 49.7 GB | Excellent | Near-lossless quality with manageable size |
| Q8_0 | 56.5 GB | Near Perfect | Virtually indistinguishable from full precision |
| FP16 | 82.1 GB | Full | Full 16-bit floating point — maximum quality, largest size |
See which devices can run this model and at what quality level.
NVIDIA H100 SXM5 80GBNVIDIA | SS | 61.6 tok/s | 43.8 GB | |
Google Cloud TPU v5pGoogle | SS | 50.8 tok/s | 43.8 GB | |
| SS | 45.0 tok/s | 43.8 GB | ||
NVIDIA A100 SXM4 80GBNVIDIA | SS | 37.5 tok/s | 43.8 GB | |
| SS | 68.0 tok/s | 43.8 GB | ||
NVIDIA H200 SXM 141GBNVIDIA | SS | 88.2 tok/s | 43.8 GB | |
| SS | 97.4 tok/s | 43.8 GB | ||
NVIDIA B200 GPUNVIDIA | SS | 147.0 tok/s | 43.8 GB | |
| SS | 110.3 tok/s | 43.8 GB | ||
| SS | 147.0 tok/s | 43.8 GB | ||
| AA | 14.7 tok/s | 43.8 GB | ||
| BB | 7.4 tok/s | 43.8 GB | ||
| BB | 14.7 tok/s | 43.8 GB | ||
| BB | 11.3 tok/s | 43.8 GB | ||
| BB | 11.3 tok/s | 43.8 GB | ||
| BB | 11.3 tok/s | 43.8 GB | ||
| BB | 7.4 tok/s | 43.8 GB | ||
| BB | 10.0 tok/s | 43.8 GB | ||
| BB | 10.0 tok/s | 43.8 GB | ||
| BB | 10.0 tok/s | 43.8 GB | ||
| BB | 10.0 tok/s | 43.8 GB | ||
| BB | 5.6 tok/s | 43.8 GB | ||
| BB | 5.0 tok/s | 43.8 GB | ||
| BB | 5.0 tok/s | 43.8 GB | ||
| BB | 3.8 tok/s | 43.8 GB |
Gemma 3 27B IT is Google’s premier open-weight model designed for high-performance local inference on single-GPU workstations. As a dense 27-billion parameter model, it occupies the "Goldilocks" zone of local AI: it is significantly more capable than 8B-class models while remaining far more accessible to run than 70B+ models that typically require multi-GPU setups.
Built on the same technological foundations as the Gemini family, Gemma 3 27B IT is a multimodal, instruction-tuned model that supports both text and vision inputs. It features a massive 128,000-token context window and has been trained on a dataset with a cutoff of August 2024. Most notably, this model achieves parity with or exceeds Gemini 1.5 Pro on several key benchmarks, making it the most capable model in its weight class for practitioners who prioritize local data privacy and low-latency response times.
Unlike Mixture-of-Experts (MoE) architectures that only activate a fraction of their parameters during inference, Gemma 3 27B IT utilizes a dense architecture. In a dense model, all 27 billion parameters are active for every token generated. While this requires more compute per token than an MoE model of a similar total size, it generally results in higher "intelligence per parameter" and more stable reasoning capabilities.
The 128K context window is a significant upgrade over previous generations, allowing developers to feed entire codebases, long technical documents, or complex image-heavy PDFs into the model for local processing. Because the model is natively multimodal, it does not rely on a separate vision encoder "bolted on" to a text model; the vision and text capabilities are integrated into the core architecture, leading to more cohesive reasoning across different data types.
Google expanded the training recipe for Gemma 3 to include support for over 140 languages. This makes it one of the most robust multilingual open-weight models available. The training data emphasizes high-quality reasoning, mathematics, and programming, which is reflected in its instruction-following accuracy.
Gemma 3 27B IT is designed for complex workflows that require a high degree of nuance. It is not just a chatbot; it is a reasoning engine capable of handling structured data and visual inputs simultaneously.
This model is a top-tier choice for local coding assistance. It excels at:
With native vision support, practitioners can run Gemma 3 27B IT locally to perform OCR, chart analysis, and visual reasoning without sending sensitive documents to a cloud API. This is particularly useful for:
The model’s support for 140+ languages makes it ideal for localization tasks and multilingual summarization. It can follow complex, multi-step instructions in languages ranging from Spanish and French to more niche dialects, maintaining high coherence and formatting accuracy (e.g., outputting valid JSON or Markdown).
To run Gemma 3 27B IT locally, your primary constraint will be Video RAM (VRAM). Because it is a 27B parameter model, the memory footprint is too large for entry-level consumer GPUs in its uncompressed state, but it is highly optimized for high-end consumer hardware when quantized.
The amount of VRAM required depends entirely on your chosen quantization level. Quantization reduces the precision of the model weights (e.g., from 16-bit to 4-bit) to save memory with minimal loss in accuracy.
For the best experience, we recommend the following hardware configurations:
The fastest way to test Gemma 3 27B IT performance on your machine is via Ollama. Once installed, you can pull the model with a single command:
ollama run gemma3:27b
Ollama automatically handles the quantization and memory management, offloading layers to your GPU based on your available VRAM.
When evaluating the local AI model 27B parameters 2025 landscape, Gemma 3 27B IT sits in a unique position between the lightweight Llama 3.1 8B and the heavyweight Llama 3.1 70B.
Gemma 3 27B IT represents the current ceiling for what is possible on a single-GPU consumer workstation. For developers who need a "daily driver" model that can see, code, and reason across 140 languages without a cloud subscription, this is the definitive choice for 2025.