1L AI workstation with Intel Core Ultra 5 235 vPro (Arrow Lake), Intel Arc iGPU, 32GB DDR5, 1TB PCIe Gen5 NVMe. Built-in AI Boost NPU and enterprise vPro management. Fleet-friendly business AI desktop.
8 GB will run a 7B Q4 quant and most embedding models, but the KV cache budget is tight. Better as a stepping stone than a long-term home for AI work.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Lenovo ThinkCentre P3 Tiny Gen 2 (Ultra 5 235) is a 1-liter AI workstation designed for practitioners who need to run local inference in space-constrained, fleet-managed environments. It’s not a gaming rig or a data-center GPU server—it’s a production-ready edge device that fits on a desk, behind a monitor, or inside a kiosk. Lenovo positions it as a business-class tiny desktop with enterprise manageability (vPro) and a built-in NPU for lightweight AI acceleration.
Priced at $1,299 MSRP, it competes directly with other compact AI-capable systems like the Apple Mac Mini M4 (base) and Intel NUC 13 Extreme, but with a distinct advantage: Lenovo’s ThinkShield security, vPro remote management, and a form factor that’s built for deployment at scale. For teams running agentic workflows, edge inference, or local LLM chatbots where physical footprint and IT control matter, this machine hits a sweet spot between performance and practicality.
The inclusion of an Intel Core Ultra 5 235 vPro (Arrow Lake) with an integrated Intel Arc iGPU and an AI Boost NPU (~13 TOPS) means you get three compute engines in one chassis. The NPU handles lightweight, continuous AI tasks (like voice wake-up or real-time transcription) without taxing the CPU or GPU, while the iGPU provides the VRAM and compute for larger models. This isn’t a high-end workstation for training—but for inference, it’s a capable, low-power workhorse.
The specs that matter for AI inference are straightforward on this machine. Here’s the breakdown:
The 8 GB of unified memory (shared with the CPU) is the limiting factor. It’s not dedicated VRAM like an NVIDIA RTX card, but it’s sufficient for running quantized 8B parameter models comfortably. The 90 GB/s memory bandwidth is modest—roughly a third of a desktop RTX 4060 (272 GB/s) and half of an Apple M4 Pro’s 200 GB/s. This directly impacts token generation speed: expect 15–25 tokens per second for a 7B–8B Q4 model, depending on prompt length and batch size.
The Intel Arc iGPU supports INT8 and FP16 operations via XMX (Xe Matrix Extensions). For inference frameworks like llama.cpp, ONNX Runtime, or OpenVINO, you can leverage the iGPU for acceleration. The NPU (Intel AI Boost) is best suited for lightweight, always-on tasks—don’t expect it to run large language models. Combined, the system delivers about 20–25 TOPS of effective INT8 compute, which is competitive at the 65W power envelope.
Power efficiency is a standout. At idle, the P3 Tiny draws around 15–20W; under full AI load, it peaks near 65W. For always-on inference servers or edge deployments where electricity cost and heat dissipation matter, this is a significant advantage over a desktop with a discrete GPU.
This is where the P3 Tiny’s capability becomes concrete. Based on the 8 GB shared memory and 90 GB/s bandwidth, here’s the practical model compatibility:
Sweet spot: 7B–8B models at Q4_K_M quantization. That gives the best quality-to-speed tradeoff on this hardware. For developers running local chatbots, RAG pipelines, or small agentic workflows, this is the configuration to target.
The Lenovo ThinkCentre P3 Tiny Gen 2 is not for everyone. Here’s who should consider it:
Inference vs. training: This is a pure inference device. If you need to fine-tune even a 7B model, look elsewhere (RTX 4090 or cloud GPU). But for running pre-trained models, the P3 Tiny delivers reliable, low-latency performance.
Two realistic alternatives at a similar price/performance tier:
Apple Mac Mini M4 (16GB unified memory, $599 base)
Intel NUC 13 Extreme (Core i7-13700K + optional RTX 4060, ~$1,500)
Where the P3 Tiny wins: Enterprise manageability, silent operation, 65W TDP, and a form factor that can be mounted anywhere. For teams that need to deploy 50+ inference nodes with centralized IT control, the Lenovo is the clear choice. For raw token-per-second performance, the Mac Mini or NUC will beat it—but they lack the fleet-friendly features that make the P3 Tiny a production-ready AI PC for business.
Qwen3-30B-A3BAlibaba | 30B(3B active) | AA | 13.5 tok/s | 5.4 GB | |
| 8B | BB | 12.8 tok/s | 5.7 GB | ||
| 9B | BB | 12.0 tok/s | 6.0 GB | ||
Llama 2 7B ChatMeta | 7B | BB | 15.1 tok/s | 4.8 GB | |
Gemma 4 E2B ITGoogle | 2B | BB | 19.5 tok/s | 3.7 GB | |
Mistral 7B InstructMistral AI | 7B | BB | 11.3 tok/s | 6.4 GB | |
Gemma 4 E4B ITGoogle | 4B | CC | 10.5 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | CC | 10.5 tok/s | 6.9 GB | |
| Ad | |||||
Qwen3.6 35B-A3BAlibaba | 35B(3B active) | DD | 8.5 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba | 35B(3B active) | DD | 8.5 tok/s | 8.5 GB | |
Llama 2 13B ChatMeta | 13B | DD | 8.6 tok/s | 8.5 GB | |
| 8B | FF | 5.4 tok/s | 13.3 GB | ||
Qwen3.5-9BAlibaba | 9B | FF | 2.9 tok/s | 24.6 GB | |
Mistral Small 3 24BMistral AI | 24B | FF | 1.9 tok/s | 39.0 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | FF | 6.6 tok/s | 11.0 GB | |
Qwen3.6-27BAlibaba | 27B | FF | 1.0 tok/s | 72.8 GB | |
| Ad | |||||
Gemma 3 27B ITGoogle | 27B | FF | 1.7 tok/s | 43.8 GB | |
Qwen3.5-27BAlibaba | 27B | FF | 1.0 tok/s | 72.8 GB | |
Gemma 4 31B ITGoogle | 31B | FF | 0.9 tok/s | 82.0 GB | |
Qwen3-32BAlibaba | 32.8B | FF | 1.3 tok/s | 53.9 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | FF | 3.0 tok/s | 24.4 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | FF | 6.4 tok/s | 11.4 GB | |
LLaMA 65BMeta | 65B | FF | 1.8 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | FF | 1.7 tok/s | 43.4 GB | |
| Ad | |||||
| 70B | FF | 1.6 tok/s | 45.7 GB | ||