No image

Lenovo

Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235)

Name: Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235)
Brand: Lenovo
Price: 1299 USD
Availability: InStock

1L AI workstation with Intel Core Ultra 5 235 vPro (Arrow Lake), Intel Arc iGPU, 32GB DDR5, 1TB PCIe Gen5 NVMe. Built-in AI Boost NPU and enterprise vPro management. Fleet-friendly business AI desktop.

AI PCs & LaptopsIn Stock

Edge AIEnterpriseProduction Ready

Buy on Amazon$1,299Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM8 GB

INT813 TOPS

TDP65 W

Memory BW90 GB/s

Max Params8B at Q4; 13B at Q3 with patience

Form Factor1L Tiny Workstation

CPUIntel Core Ultra 5 235 vPro (14C/14T, 6P+8E Arrow Lake, up to 5.1 GHz)

GPUIntel Arc Graphics (integrated)

NPUIntel AI Boost (~13 TOPS)

Memory32GB DDR5-5600 (dual-channel, ~90 GB/s, shared with iGPU)

Storage1TB PCIe Gen5 NVMe SSD

Display OutputHDMI 2.1, DisplayPort, USB-C / Thunderbolt

ConnectivityWiFi 6, vPro security and management

TDP65W base

BundleIncludes Astrol External DVD Drive

OSWindows 11 Pro

Our Take

Best for: Entry-level 7B inference and embedding workloads

8 GB will run a 7B Q4 quant and most embedding models, but the KV cache budget is tight. Better as a stepping stone than a long-term home for AI work.

Pair this withLlama 3 8B Instruct (8B)Largest popular open model that fits at Q4 — needs roughly 5.7 GB on this 8 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

The Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235) is a 1-liter AI workstation designed for practitioners who need to run local inference in space-constrained, fleet-managed environments. It’s not a gaming rig or a data-center GPU server—it’s a production-ready edge device that fits on a desk, behind a monitor, or inside a kiosk. Lenovo positions it as a business-class tiny desktop with enterprise manageability (vPro) and a built-in NPU for lightweight AI acceleration.

Priced at $1,299 MSRP, it competes directly with other compact AI-capable systems like the Apple Mac Mini M4 (base) and Intel NUC 13 Extreme, but with a distinct advantage: Lenovo’s ThinkShield security, vPro remote management, and a form factor that’s built for deployment at scale. For teams running agentic workflows, edge inference, or local LLM chatbots where physical footprint and IT control matter, this machine hits a sweet spot between performance and practicality.

The inclusion of an Intel Core Ultra 5 235 vPro (Arrow Lake) with an integrated Intel Arc iGPU and an AI Boost NPU (~13 TOPS) means you get three compute engines in one chassis. The NPU handles lightweight, continuous AI tasks (like voice wake-up or real-time transcription) without taxing the CPU or GPU, while the iGPU provides the VRAM and compute for larger models. This isn’t a high-end workstation for training—but for inference, it’s a capable, low-power workhorse.

AI Performance & Specifications

The specs that matter for AI inference are straightforward on this machine. Here’s the breakdown:

VRAM: 8 GB (shared system memory, accessible by the Intel Arc iGPU)
INT8 Performance: 13 TOPS (iGPU + NPU combined; NPU alone ~13 TOPS, iGPU adds ~8-10 TOPS for matrix ops)
Memory Bandwidth: 90 GB/s (dual-channel DDR5-5600, shared between CPU and iGPU)
TDP: 65 W (base)
Max Model Params: 8B at Q4; 13B at Q3 with patience

The 8 GB of unified memory (shared with the CPU) is the limiting factor. It’s not dedicated VRAM like an NVIDIA RTX card, but it’s sufficient for running quantized 8B parameter models comfortably. The 90 GB/s memory bandwidth is modest—roughly a third of a desktop RTX 4060 (272 GB/s) and half of an Apple M4 Pro’s 200 GB/s. This directly impacts token generation speed: expect 15–25 tokens per second for a 7B–8B Q4 model, depending on prompt length and batch size.

The Intel Arc iGPU supports INT8 and FP16 operations via XMX (Xe Matrix Extensions). For inference frameworks like llama.cpp, ONNX Runtime, or OpenVINO, you can leverage the iGPU for acceleration. The NPU (Intel AI Boost) is best suited for lightweight, always-on tasks—don’t expect it to run large language models. Combined, the system delivers about 20–25 TOPS of effective INT8 compute, which is competitive at the 65W power envelope.

Power efficiency is a standout. At idle, the P3 Tiny draws around 15–20W; under full AI load, it peaks near 65W. For always-on inference servers or edge deployments where electricity cost and heat dissipation matter, this is a significant advantage over a desktop with a discrete GPU.

What Models Can It Run?

This is where the P3 Tiny’s capability becomes concrete. Based on the 8 GB shared memory and 90 GB/s bandwidth, here’s the practical model compatibility:

Llama 3.1 8B (Q4_K_M): Fits comfortably. Expect 15–20 tokens/s on the iGPU (via llama.cpp with Vulkan or OpenVINO backend). Good for conversational agents, summarization, and RAG.
Mistral 7B (Q4_K_M): Runs easily. Slightly faster than Llama 8B due to smaller size—around 18–22 tokens/s.
Qwen 2.5 7B (Q4): Similar performance. Works well for multilingual tasks.
DeepSeek-R1-Distill-Qwen 7B (Q4): Fits. Reasoning-style models are viable, though long-context generation (over 8K tokens) will slow due to memory bandwidth limits.
Llama 3.1 13B (Q3_K_M): Tight fit. You’ll need to use a Q3 quantization (e.g., Q3_K_M) and keep context under 4K tokens. Expect 8–12 tokens/s. “With patience” is the accurate description—usable for batch inference or offline processing, not real-time chat.
Phi-3-mini 3.8B (Q4): Overkill. Runs at 30+ tokens/s, but you’re wasting compute.
Multimodal models (LLaVA, Qwen-VL): Possible if you keep image resolution low and use Q4 quantization. The shared memory means the model and image embeddings compete for the same 8 GB. Practical for single-image QA, not video.
Long-context tasks (32K tokens): Not recommended. Memory bandwidth bottleneck becomes severe. Stick to 4K–8K context for acceptable speed.

Sweet spot: 7B–8B models at Q4_K_M quantization. That gives the best quality-to-speed tradeoff on this hardware. For developers running local chatbots, RAG pipelines, or small agentic workflows, this is the configuration to target.

Use Cases & Target Audience

The Lenovo ThinkCentre P3 Tiny Gen 2 is not for everyone. Here’s who should consider it:

Enterprise teams deploying AI agents at scale: The vPro management, TPM 2.0, and ThinkShield security make it IT-friendly. You can remotely provision, update, and monitor hundreds of these units running local inference for customer-facing chatbots or internal knowledge assistants.
Edge AI practitioners: The 1L form factor, low power draw, and wide operating temperature range (0–40°C) make it suitable for factory floors, retail kiosks, or medical devices. Mount it behind a display with the VESA bracket.
Hobbyists running local LLMs on a budget: $1,299 for a silent, always-on inference machine is competitive. You won’t train models, but you can run Llama 3.1 8B all day at 20 tokens/s with negligible electricity cost.
Developers building AI-powered applications: Need a local test server that mimics a cloud inference endpoint? The P3 Tiny can run vLLM or llama.cpp server with an OpenAI-compatible API. Great for CI/CD pipelines where you want to avoid GPU cloud costs.
Not for training: Zero. The integrated GPU has no tensor cores for training. Use this exclusively for inference.

Inference vs. training: This is a pure inference device. If you need to fine-tune even a 7B model, look elsewhere (RTX 4090 or cloud GPU). But for running pre-trained models, the P3 Tiny delivers reliable, low-latency performance.

How It Compares

Two realistic alternatives at a similar price/performance tier:

Apple Mac Mini M4 (16GB unified memory, $599 base)

Pros: Faster memory bandwidth (120 GB/s), better GPU compute (M4 GPU ~3.4 TFLOPS FP16), lower idle power, macOS ecosystem (MLX, Core ML). Can run 13B models at Q4 (16GB).
Cons: No vPro management, no Windows-native enterprise tooling, limited to macOS-only frameworks. Harder to deploy in Windows-centric enterprise environments.
Pick the Mac Mini if: You need higher memory bandwidth for larger models, prefer macOS, and don’t need fleet management.

Intel NUC 13 Extreme (Core i7-13700K + optional RTX 4060, ~$1,500)

Pros: Discrete GPU (8 GB VRAM, 272 GB/s bandwidth) enables 13B Q4 models at 30+ tokens/s. More CPU cores for non-AI workloads.
Cons: Larger footprint (3L+), higher power draw (300W peak), no vPro, less IT-friendly.
Pick the NUC if: You need discrete GPU performance and don’t care about enterprise manageability or power efficiency.

Where the P3 Tiny wins: Enterprise manageability, silent operation, 65W TDP, and a form factor that can be mounted anywhere. For teams that need to deploy 50+ inference nodes with centralized IT control, the Lenovo is the clear choice. For raw token-per-second performance, the Mac Mini or NUC will beat it—but they lack the fleet-friendly features that make the P3 Tiny a production-ready AI PC for business.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba	30B(3B active)	AA	13.5 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	BB	12.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	BB	12.0 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	BB	15.1 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	BB	19.5 tok/s	3.7 GB
Mistral 7B InstructMistral AI	7B	BB	11.3 tok/s	6.4 GB
Gemma 4 E4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Qwen3.6 35B-A3BAlibaba	35B(3B active)	DD	8.5 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba	35B(3B active)	DD	8.5 tok/s	8.5 GB
Llama 2 13B ChatMeta	13B	DD	8.6 tok/s	8.5 GB
Llama 3.1 8B InstructMeta	8B	FF	5.4 tok/s	13.3 GB
Qwen3.5-9BAlibaba	9B	FF	2.9 tok/s	24.6 GB
Mistral Small 3 24BMistral AI	24B	FF	1.9 tok/s	39.0 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	FF	6.6 tok/s	11.0 GB
Qwen3.6-27BAlibaba	27B	FF	1.0 tok/s	72.8 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Gemma 3 27B ITGoogle	27B	FF	1.7 tok/s	43.8 GB
Qwen3.5-27BAlibaba	27B	FF	1.0 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	0.9 tok/s	82.0 GB
Qwen3-32BAlibaba	32.8B	FF	1.3 tok/s	53.9 GB
Falcon 40B InstructTechnology Innovation Institute	40B	FF	3.0 tok/s	24.4 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	FF	6.4 tok/s	11.4 GB
LLaMA 65BMeta	65B	FF	1.8 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	FF	1.7 tok/s	43.4 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	FF	1.6 tok/s	45.7 GB

Rows per page

Page 1 of 3

Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235)

AI PCs & LaptopsIn Stock

Edge AIEnterpriseProduction Ready

Buy on Amazon$1,299Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM8 GB

INT813 TOPS

TDP65 W

Memory BW90 GB/s

Max Params8B at Q4; 13B at Q3 with patience

Form Factor1L Tiny Workstation

CPUIntel Core Ultra 5 235 vPro (14C/14T, 6P+8E Arrow Lake, up to 5.1 GHz)

GPUIntel Arc Graphics (integrated)

NPUIntel AI Boost (~13 TOPS)

Memory32GB DDR5-5600 (dual-channel, ~90 GB/s, shared with iGPU)

Storage1TB PCIe Gen5 NVMe SSD

Display OutputHDMI 2.1, DisplayPort, USB-C / Thunderbolt

ConnectivityWiFi 6, vPro security and management

TDP65W base

BundleIncludes Astrol External DVD Drive

OSWindows 11 Pro

Our Take

Best for: Entry-level 7B inference and embedding workloads

8 GB will run a 7B Q4 quant and most embedding models, but the KV cache budget is tight. Better as a stepping stone than a long-term home for AI work.

Pair this withLlama 3 8B Instruct (8B)Largest popular open model that fits at Q4 — needs roughly 5.7 GB on this 8 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

AI Performance & Specifications

The specs that matter for AI inference are straightforward on this machine. Here’s the breakdown:

VRAM: 8 GB (shared system memory, accessible by the Intel Arc iGPU)
INT8 Performance: 13 TOPS (iGPU + NPU combined; NPU alone ~13 TOPS, iGPU adds ~8-10 TOPS for matrix ops)
Memory Bandwidth: 90 GB/s (dual-channel DDR5-5600, shared between CPU and iGPU)
TDP: 65 W (base)
Max Model Params: 8B at Q4; 13B at Q3 with patience

What Models Can It Run?

This is where the P3 Tiny’s capability becomes concrete. Based on the 8 GB shared memory and 90 GB/s bandwidth, here’s the practical model compatibility:

Llama 3.1 8B (Q4_K_M): Fits comfortably. Expect 15–20 tokens/s on the iGPU (via llama.cpp with Vulkan or OpenVINO backend). Good for conversational agents, summarization, and RAG.
Mistral 7B (Q4_K_M): Runs easily. Slightly faster than Llama 8B due to smaller size—around 18–22 tokens/s.
Qwen 2.5 7B (Q4): Similar performance. Works well for multilingual tasks.
DeepSeek-R1-Distill-Qwen 7B (Q4): Fits. Reasoning-style models are viable, though long-context generation (over 8K tokens) will slow due to memory bandwidth limits.
Llama 3.1 13B (Q3_K_M): Tight fit. You’ll need to use a Q3 quantization (e.g., Q3_K_M) and keep context under 4K tokens. Expect 8–12 tokens/s. “With patience” is the accurate description—usable for batch inference or offline processing, not real-time chat.
Phi-3-mini 3.8B (Q4): Overkill. Runs at 30+ tokens/s, but you’re wasting compute.
Multimodal models (LLaVA, Qwen-VL): Possible if you keep image resolution low and use Q4 quantization. The shared memory means the model and image embeddings compete for the same 8 GB. Practical for single-image QA, not video.
Long-context tasks (32K tokens): Not recommended. Memory bandwidth bottleneck becomes severe. Stick to 4K–8K context for acceptable speed.

Use Cases & Target Audience

The Lenovo ThinkCentre P3 Tiny Gen 2 is not for everyone. Here’s who should consider it:

Enterprise teams deploying AI agents at scale: The vPro management, TPM 2.0, and ThinkShield security make it IT-friendly. You can remotely provision, update, and monitor hundreds of these units running local inference for customer-facing chatbots or internal knowledge assistants.
Edge AI practitioners: The 1L form factor, low power draw, and wide operating temperature range (0–40°C) make it suitable for factory floors, retail kiosks, or medical devices. Mount it behind a display with the VESA bracket.
Hobbyists running local LLMs on a budget: $1,299 for a silent, always-on inference machine is competitive. You won’t train models, but you can run Llama 3.1 8B all day at 20 tokens/s with negligible electricity cost.
Developers building AI-powered applications: Need a local test server that mimics a cloud inference endpoint? The P3 Tiny can run vLLM or llama.cpp server with an OpenAI-compatible API. Great for CI/CD pipelines where you want to avoid GPU cloud costs.
Not for training: Zero. The integrated GPU has no tensor cores for training. Use this exclusively for inference.

How It Compares

Two realistic alternatives at a similar price/performance tier:

Apple Mac Mini M4 (16GB unified memory, $599 base)

Pros: Faster memory bandwidth (120 GB/s), better GPU compute (M4 GPU ~3.4 TFLOPS FP16), lower idle power, macOS ecosystem (MLX, Core ML). Can run 13B models at Q4 (16GB).
Cons: No vPro management, no Windows-native enterprise tooling, limited to macOS-only frameworks. Harder to deploy in Windows-centric enterprise environments.
Pick the Mac Mini if: You need higher memory bandwidth for larger models, prefer macOS, and don’t need fleet management.

Intel NUC 13 Extreme (Core i7-13700K + optional RTX 4060, ~$1,500)

Pros: Discrete GPU (8 GB VRAM, 272 GB/s bandwidth) enables 13B Q4 models at 30+ tokens/s. More CPU cores for non-AI workloads.
Cons: Larger footprint (3L+), higher power draw (300W peak), no vPro, less IT-friendly.
Pick the NUC if: You need discrete GPU performance and don’t care about enterprise manageability or power efficiency.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba	30B(3B active)	AA	13.5 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	BB	12.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	BB	12.0 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	BB	15.1 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	BB	19.5 tok/s	3.7 GB
Mistral 7B InstructMistral AI	7B	BB	11.3 tok/s	6.4 GB
Gemma 4 E4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Qwen3.6 35B-A3BAlibaba	35B(3B active)	DD	8.5 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba	35B(3B active)	DD	8.5 tok/s	8.5 GB
Llama 2 13B ChatMeta	13B	DD	8.6 tok/s	8.5 GB
Llama 3.1 8B InstructMeta	8B	FF	5.4 tok/s	13.3 GB
Qwen3.5-9BAlibaba	9B	FF	2.9 tok/s	24.6 GB
Mistral Small 3 24BMistral AI	24B	FF	1.9 tok/s	39.0 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	FF	6.6 tok/s	11.0 GB
Qwen3.6-27BAlibaba	27B	FF	1.0 tok/s	72.8 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Gemma 3 27B ITGoogle	27B	FF	1.7 tok/s	43.8 GB
Qwen3.5-27BAlibaba	27B	FF	1.0 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	0.9 tok/s	82.0 GB
Qwen3-32BAlibaba	32.8B	FF	1.3 tok/s	53.9 GB
Falcon 40B InstructTechnology Innovation Institute	40B	FF	3.0 tok/s	24.4 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	FF	6.4 tok/s	11.4 GB
LLaMA 65BMeta	65B	FF	1.8 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	FF	1.7 tok/s	43.4 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	FF	1.6 tok/s	45.7 GB

Rows per page

Page 1 of 3

Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235)

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

What Models Can It Run?

Use Cases & Target Audience

How It Compares

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)

Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235)

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

What Models Can It Run?

Use Cases & Target Audience

How It Compares

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)