Google

Gemma 3 27B IT

Google's most capable open-weight single-GPU model. Dense 27B multimodal with 128K context, 140+ languages. Beats Gemini 1.5 Pro on several benchmarks.

27B paramsDense128K ctxText + Vision

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Workstation-class chat and code with usable context

A workable 27B-parameter dense language model from Google. A pragmatic middle-ground choice when you need open weights without a flagship-sized footprint. Currently trending on Hugging Face — community interest is climbing.

Run this onApple M4 Pro (14-core CPU, 20-core GPU)Cheapest card in our directory with comfortable headroom (64 GB) for this model at Q4 (~43.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Multilingual

Summarization

Instruction Following

Model Specifications

Parameters27B

ArchitectureDense

Context Length128K tokens

ModalityText + Vision

Training CutoffAugust 2024

ProviderGoogle

Download Size179.8 GB

Community

Monthly Downloads1.0M

Likes2.0K

Last Updated1 years ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run gemma3:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Gemma Terms of UseView Full License

Performance & Scoring

Benchmarks

SWE-Pro

11.4

Arena Score

78.7

Overall Score

52.8CC

Benchmark40%

45.0

Popularity25%

67.0

Efficiency20%

39.3

Versatility15%

68.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	38.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	43.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	46.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	49.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	56.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	82.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	61.6 tok/s	43.8 GB
Google Cloud TPU v5pGoogle	SS	50.8 tok/s	43.8 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	45.0 tok/s	43.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	37.5 tok/s	43.8 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	68.0 tok/s	43.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	88.2 tok/s	43.8 GB
AMD Instinct MI300XAMD	SS	97.4 tok/s	43.8 GB
Google TPU v7 (Ironwood)Google	SS	135.6 tok/s	43.8 GB
NVIDIA B200 GPUNVIDIA	SS	147.0 tok/s	43.8 GB
AMD Instinct MI325XAMD	SS	110.3 tok/s	43.8 GB
AMD Instinct MI355XAMD	SS	147.0 tok/s	43.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	130.5 tok/s	43.8 GB
Dell Pro Max with GB300Dell	SS	130.5 tok/s	43.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	130.5 tok/s	43.8 GB
HP ZGX Fury AI StationHP	SS	130.5 tok/s	43.8 GB
MSI XpertStation WS300MSI	SS	130.5 tok/s	43.8 GB
SuperMicro Super AI StationSuperMicro	SS	130.5 tok/s	43.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	14.7 tok/s	43.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	9.4 tok/s	43.8 GB
Apple Mac Studio (M1 Max, 2022)Apple	BB	7.4 tok/s	43.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	14.7 tok/s	43.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	11.3 tok/s	43.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	11.3 tok/s	43.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	11.3 tok/s	43.8 GB
Apple Mac Studio (M2 Max, 2023)Apple	BB	7.4 tok/s	43.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~4.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Gemma 3 27B IT on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~4.7 tok/s · 150W	$1.06
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 44 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Gemma 3 27B IT is Google’s premier open-weight model designed for high-performance local inference on single-GPU workstations. As a dense 27-billion parameter model, it occupies the "Goldilocks" zone of local AI: it is significantly more capable than 8B-class models while remaining far more accessible to run than 70B+ models that typically require multi-GPU setups.

Built on the same technological foundations as the Gemini family, Gemma 3 27B IT is a multimodal, instruction-tuned model that supports both text and vision inputs. It features a massive 128,000-token context window and has been trained on a dataset with a cutoff of August 2024. Most notably, this model achieves parity with or exceeds Gemini 1.5 Pro on several key benchmarks, making it the most capable model in its weight class for practitioners who prioritize local data privacy and low-latency response times.

Architecture and Technical Details

Unlike Mixture-of-Experts (MoE) architectures that only activate a fraction of their parameters during inference, Gemma 3 27B IT utilizes a dense architecture. In a dense model, all 27 billion parameters are active for every token generated. While this requires more compute per token than an MoE model of a similar total size, it generally results in higher "intelligence per parameter" and more stable reasoning capabilities.

Context Window and Multimodality

The 128K context window is a significant upgrade over previous generations, allowing developers to feed entire codebases, long technical documents, or complex image-heavy PDFs into the model for local processing. Because the model is natively multimodal, it does not rely on a separate vision encoder "bolted on" to a text model; the vision and text capabilities are integrated into the core architecture, leading to more cohesive reasoning across different data types.

Training and Multilingual Support

Google expanded the training recipe for Gemma 3 to include support for over 140 languages. This makes it one of the most robust multilingual open-weight models available. The training data emphasizes high-quality reasoning, mathematics, and programming, which is reflected in its instruction-following accuracy.

Capabilities and Use Cases

Gemma 3 27B IT is designed for complex workflows that require a high degree of nuance. It is not just a chatbot; it is a reasoning engine capable of handling structured data and visual inputs simultaneously.

Gemma 3 27B IT for Coding

This model is a top-tier choice for local coding assistance. It excels at:

Repository-wide Refactoring: Utilizing the 128K context to understand cross-file dependencies.
Code Explanation: Breaking down complex C++, Rust, or Python snippets into architectural logic.
Unit Test Generation: Writing comprehensive test suites based on existing implementation files.
Visual Debugging: Analyzing screenshots of UI bugs or stack traces to suggest CSS or logic fixes.

Vision and Document Analysis

With native vision support, practitioners can run Gemma 3 27B IT locally to perform OCR, chart analysis, and visual reasoning without sending sensitive documents to a cloud API. This is particularly useful for:

Financial Analysis: Processing scanned quarterly reports and extracting data from complex tables.
Technical Manuals: Answering questions based on circuit diagrams or architectural blueprints.
Automated Tagging: Generating descriptive metadata for large local image libraries.

Multilingual Instruction-Following

The model’s support for 140+ languages makes it ideal for localization tasks and multilingual summarization. It can follow complex, multi-step instructions in languages ranging from Spanish and French to more niche dialects, maintaining high coherence and formatting accuracy (e.g., outputting valid JSON or Markdown).

Running Gemma 3 27B IT Locally

To run Gemma 3 27B IT locally, your primary constraint will be Video RAM (VRAM). Because it is a 27B parameter model, the memory footprint is too large for entry-level consumer GPUs in its uncompressed state, but it is highly optimized for high-end consumer hardware when quantized.

Gemma 3 27B IT VRAM Requirements

The amount of VRAM required depends entirely on your chosen quantization level. Quantization reduces the precision of the model weights (e.g., from 16-bit to 4-bit) to save memory with minimal loss in accuracy.

FP16 (Uncompressed): ~54 GB VRAM. Requires professional hardware like an NVIDIA A6000 or A100.
Q8_0 (8-bit): ~29 GB VRAM. Requires a multi-GPU setup (e.g., 2x RTX 3090/4090) or a Mac Studio with Unified Memory.
Q4_K_M (4-bit Recommended): ~17–19 GB VRAM. This is the best quantization for Gemma 3 27B IT for most practitioners. It fits comfortably on a single RTX 3090 or RTX 4090 (24GB) with enough headroom for the 128K context window.
Q3_K_M (3-bit): ~12–14 GB VRAM. Can run on an RTX 4080 (16GB) or RTX 3060 (12GB) with a restricted context window.

Gemma 3 27B IT Hardware Requirements

For the best experience, we recommend the following hardware configurations:

NVIDIA Users: An RTX 4090 24GB is the best GPU for Gemma 3 27B IT. It provides enough VRAM for 4-bit quantization while offering the highest Gemma 3 27B IT tokens per second (typically 40–60 t/s depending on the backend).
Mac Users: Any Apple Silicon Mac with at least 32GB of Unified Memory (M2/M3/M4 Max or Ultra) will run this model exceptionally well.
Linux/Workstation: If you are building a dedicated local AI rig, a dual RTX 3090 setup (48GB total VRAM) allows you to run the model at Q8_0 precision with the full 128K context window active.

Quick Start with Ollama

The fastest way to test Gemma 3 27B IT performance on your machine is via Ollama. Once installed, you can pull the model with a single command:

ollama run gemma3:27b

Ollama automatically handles the quantization and memory management, offloading layers to your GPU based on your available VRAM.

How It Compares

When evaluating the local AI model 27B parameters 2025 landscape, Gemma 3 27B IT sits in a unique position between the lightweight Llama 3.1 8B and the heavyweight Llama 3.1 70B.

Gemma 3 27B IT vs. Llama 3.1 8B

Intelligence: Gemma 3 27B is significantly more capable at complex reasoning, coding, and multilingual tasks.
Speed: Llama 3.1 8B is much faster and can run on almost any modern laptop.
Use Case: Choose Llama 8B for simple chat or basic summarization; choose Gemma 27B for coding, vision, and "smart" agentic workflows.

Gemma 3 27B IT vs. Llama 3.1 70B

Hardware Accessibility: Gemma 3 27B fits on a single consumer GPU (RTX 3090/4090). Llama 3.1 70B requires at least two 24GB GPUs or a high-end Mac to run at usable speeds.
Performance: While Llama 3.1 70B has a slight edge in pure knowledge retrieval, Gemma 3 27B’s performance on benchmarks often matches or beats 70B-class models in coding and mathematics.
Modality: Gemma 3 27B IT has native vision capabilities, whereas Llama 3.1 is primarily a text-only model (unless using specific multimodal variants like Llama 3.2).

Gemma 3 27B IT vs. Mistral-Nemo 12B

Capacity: Gemma 3 27B is more than double the size of Mistral-Nemo and offers noticeably better instruction-following and creative writing.
Context: Both offer large context windows, but Gemma’s 128K is more robust for extremely large document sets compared to Mistral-Nemo’s 128K implementation, which can see quality degradation at the extreme edges.

Gemma 3 27B IT represents the current ceiling for what is possible on a single-GPU consumer workstation. For developers who need a "daily driver" model that can see, code, and reason across 140 languages without a cloud subscription, this is the definitive choice for 2025.

Related Models

Google

Explore the Provider

See all Google models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Google model we track.

Open Google

Explore the Family

See every Gemma release

The full Gemma family leaderboard with sizes, benchmark scores, and a release timeline.

Open Gemma

Find the Best Hardware for This Model

Use our hardware calculator to find the optimal device for running this model.

Google

Gemma 3 27B IT

Google's most capable open-weight single-GPU model. Dense 27B multimodal with 128K context, 140+ languages. Beats Gemini 1.5 Pro on several benchmarks.

27B paramsDense128K ctxText + Vision

View on Hugging Face

Run with Ollama Official Page

Our Take

Best for: Workstation-class chat and code with usable context

Run this onApple M4 Pro (14-core CPU, 20-core GPU)Cheapest card in our directory with comfortable headroom (64 GB) for this model at Q4 (~43.8 GB).

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Capabilities

Chat

Code Generation

Vision

Multilingual

Summarization

Instruction Following

Model Specifications

Parameters27B

ArchitectureDense

Context Length128K tokens

ModalityText + Vision

Training CutoffAugust 2024

ProviderGoogle

Download Size179.8 GB

Community

Monthly Downloads1.0M

Likes2.0K

Last Updated1 years ago

Quick Start

Run with Ollama

Copy and paste this command to start running the model locally.

ollama run gemma3:27b

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Gemma Terms of UseView Full License

Performance & Scoring

Benchmarks

SWE-Pro

11.4

Arena Score

78.7

Overall Score

52.8CC

Benchmark40%

45.0

Popularity25%

67.0

Efficiency20%

39.3

Versatility15%

68.0

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	38.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	43.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	46.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	49.7 GB	Excellent	Near-lossless quality with manageable size
Q8_0	56.5 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	82.1 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


NVIDIA H100 SXM5 80GBNVIDIA	SS	61.6 tok/s	43.8 GB
Google Cloud TPU v5pGoogle	SS	50.8 tok/s	43.8 GB
Intel Gaudi 2 AI AcceleratorIntel	SS	45.0 tok/s	43.8 GB
NVIDIA A100 SXM4 80GBNVIDIA	SS	37.5 tok/s	43.8 GB
Intel Gaudi 3 AI AcceleratorIntel	SS	68.0 tok/s	43.8 GB
NVIDIA H200 SXM 141GBNVIDIA	SS	88.2 tok/s	43.8 GB
AMD Instinct MI300XAMD	SS	97.4 tok/s	43.8 GB
Google TPU v7 (Ironwood)Google	SS	135.6 tok/s	43.8 GB
NVIDIA B200 GPUNVIDIA	SS	147.0 tok/s	43.8 GB
AMD Instinct MI325XAMD	SS	110.3 tok/s	43.8 GB
AMD Instinct MI355XAMD	SS	147.0 tok/s	43.8 GB
ASUS ExpertCenter Pro ET900N G3ASUS	SS	130.5 tok/s	43.8 GB
Dell Pro Max with GB300Dell	SS	130.5 tok/s	43.8 GB
Gigabyte W775-V10-L01Gigabyte	SS	130.5 tok/s	43.8 GB
HP ZGX Fury AI StationHP	SS	130.5 tok/s	43.8 GB
MSI XpertStation WS300MSI	SS	130.5 tok/s	43.8 GB
SuperMicro Super AI StationSuperMicro	SS	130.5 tok/s	43.8 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	AA	14.7 tok/s	43.8 GB
Corsair AI Workstation 300 (Ryzen AI Max+ 395)Corsair	BB	9.4 tok/s	43.8 GB
Apple Mac Studio (M1 Max, 2022)Apple	BB	7.4 tok/s	43.8 GB
Apple Mac Studio (M2 Ultra, 2023)Apple	BB	14.7 tok/s	43.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	BB	11.3 tok/s	43.8 GB
MacBook Pro 16-inch M5 Max (2026)Apple	BB	11.3 tok/s	43.8 GB
MacBook Pro 16" M5 Max (2026)Apple	BB	11.3 tok/s	43.8 GB
Apple Mac Studio (M2 Max, 2023)Apple	BB	7.4 tok/s	43.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on Corsair AI Workstation 300 (Ryzen AI Max 385) (~4.7 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)Gemma 3 27B IT on Corsair AI Workstation 300 (Ryzen AI Max 385) · ~4.7 tok/s · 150W	$1.06
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 44 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA A100 80GB SXMVast.ai · On-Demand · 80 GB VRAM	$0.27
AMD Instinct MI300XRunPod · Community · 192 GB VRAM	$0.50
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM	$0.50
NVIDIA A100 80GB SXMVast.ai · Spot · 80 GB VRAM	$0.53
NVIDIA L40RunPod · Community · 48 GB VRAM	$0.69

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Architecture and Technical Details

Context Window and Multimodality

Training and Multilingual Support

Capabilities and Use Cases

Gemma 3 27B IT for Coding

This model is a top-tier choice for local coding assistance. It excels at:

Repository-wide Refactoring: Utilizing the 128K context to understand cross-file dependencies.
Code Explanation: Breaking down complex C++, Rust, or Python snippets into architectural logic.
Unit Test Generation: Writing comprehensive test suites based on existing implementation files.
Visual Debugging: Analyzing screenshots of UI bugs or stack traces to suggest CSS or logic fixes.

Vision and Document Analysis

Financial Analysis: Processing scanned quarterly reports and extracting data from complex tables.
Technical Manuals: Answering questions based on circuit diagrams or architectural blueprints.
Automated Tagging: Generating descriptive metadata for large local image libraries.

Multilingual Instruction-Following

Running Gemma 3 27B IT Locally

Gemma 3 27B IT VRAM Requirements

FP16 (Uncompressed): ~54 GB VRAM. Requires professional hardware like an NVIDIA A6000 or A100.
Q8_0 (8-bit): ~29 GB VRAM. Requires a multi-GPU setup (e.g., 2x RTX 3090/4090) or a Mac Studio with Unified Memory.
Q4_K_M (4-bit Recommended): ~17–19 GB VRAM. This is the best quantization for Gemma 3 27B IT for most practitioners. It fits comfortably on a single RTX 3090 or RTX 4090 (24GB) with enough headroom for the 128K context window.
Q3_K_M (3-bit): ~12–14 GB VRAM. Can run on an RTX 4080 (16GB) or RTX 3060 (12GB) with a restricted context window.

Gemma 3 27B IT Hardware Requirements

For the best experience, we recommend the following hardware configurations:

NVIDIA Users: An RTX 4090 24GB is the best GPU for Gemma 3 27B IT. It provides enough VRAM for 4-bit quantization while offering the highest Gemma 3 27B IT tokens per second (typically 40–60 t/s depending on the backend).
Mac Users: Any Apple Silicon Mac with at least 32GB of Unified Memory (M2/M3/M4 Max or Ultra) will run this model exceptionally well.
Linux/Workstation: If you are building a dedicated local AI rig, a dual RTX 3090 setup (48GB total VRAM) allows you to run the model at Q8_0 precision with the full 128K context window active.

Quick Start with Ollama

The fastest way to test Gemma 3 27B IT performance on your machine is via Ollama. Once installed, you can pull the model with a single command:

ollama run gemma3:27b

Ollama automatically handles the quantization and memory management, offloading layers to your GPU based on your available VRAM.

How It Compares

When evaluating the local AI model 27B parameters 2025 landscape, Gemma 3 27B IT sits in a unique position between the lightweight Llama 3.1 8B and the heavyweight Llama 3.1 70B.

Gemma 3 27B IT vs. Llama 3.1 8B

Intelligence: Gemma 3 27B is significantly more capable at complex reasoning, coding, and multilingual tasks.
Speed: Llama 3.1 8B is much faster and can run on almost any modern laptop.
Use Case: Choose Llama 8B for simple chat or basic summarization; choose Gemma 27B for coding, vision, and "smart" agentic workflows.

Gemma 3 27B IT vs. Llama 3.1 70B

Hardware Accessibility: Gemma 3 27B fits on a single consumer GPU (RTX 3090/4090). Llama 3.1 70B requires at least two 24GB GPUs or a high-end Mac to run at usable speeds.
Performance: While Llama 3.1 70B has a slight edge in pure knowledge retrieval, Gemma 3 27B’s performance on benchmarks often matches or beats 70B-class models in coding and mathematics.
Modality: Gemma 3 27B IT has native vision capabilities, whereas Llama 3.1 is primarily a text-only model (unless using specific multimodal variants like Llama 3.2).

Gemma 3 27B IT vs. Mistral-Nemo 12B

Capacity: Gemma 3 27B is more than double the size of Mistral-Nemo and offers noticeably better instruction-following and creative writing.
Context: Both offer large context windows, but Gemma’s 128K is more robust for extremely large document sets compared to Mistral-Nemo’s 128K implementation, which can see quality degradation at the extreme edges.