Lenovo

Lenovo ThinkStation PGX - 1TB

Name: Lenovo ThinkStation PGX - 1TB
Brand: Lenovo
Price: 4100 USD
Availability: InStock

A corporate-targeted GB10 mini-tower equipped with a 1TB SSD to provide a cost-optimized platform for localized AI development.

AI PCs & LaptopsIn Stock

EnterpriseEdge AI

Buy on Manufacturer$4,100Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

CPU Architecture20-Core Arm (10 Cortex-X925 + 10 Cortex-A725)

Form FactorMini-tower

Dimensions150 x 150 x 50.5 mm

Weight1.2 kg

Specifications

The Lenovo ThinkStation PGX - 1TB is a specialized, small-form-factor workstation designed to bridge the gap between consumer-grade workstations and enterprise-grade data center infrastructure. Built around the NVIDIA GB10 Grace Blackwell architecture, this mini-tower is a dedicated platform for local AI development, fine-tuning, and inference. Unlike traditional workstations that rely on x86 architectures, the PGX utilizes a 20-core Arm-based CPU (Cortex-X925 and Cortex-A725) paired with Blackwell-generation Tensor cores, providing a high-efficiency environment for running large language models (LLMs) and agentic workflows at the edge.

For AI engineers and researchers, the ThinkStation PGX represents a move toward decentralized AI. It offers a "sandbox" environment that mirrors the NVIDIA software stack found in DGX systems but at a $4,100 MSRP. This makes it a primary contender for organizations that need to keep sensitive data local while maintaining the performance required for modern, high-parameter models. In the market for AI PCs and laptops, the PGX stands out by prioritizing VRAM capacity and unified memory over traditional desktop versatility.

AI Performance & Specifications

The core value proposition of the Lenovo ThinkStation PGX - 1TB for AI is its 128GB of unified LPDDR5x memory. In the context of local AI, VRAM is the primary bottleneck; without enough of it, large models simply will not load or will fall back to system RAM, causing performance to crater.

Key Technical Metrics:

VRAM/Unified Memory: 128 GB LPDDR5x
Memory Bandwidth: 273 GB/s
INT8 Performance: 250 TOPS
FP16 Performance: 29.71 TFLOPS
TDP: 140 W
Form Factor: 150 x 150 x 50.5 mm (1.2 kg)

The 273 GB/s memory bandwidth is a critical spec for token generation. While lower than high-end H100 or B200 data center GPUs, it is significantly higher than most consumer-grade laptops and matches or exceeds many high-end desktop configurations. This bandwidth ensures that the Lenovo ThinkStation PGX - 1TB AI inference performance remains stable even when processing long-context windows. Furthermore, the 140W TDP is remarkably efficient for a machine capable of 250 TOPS, making it suitable for continuous edge deployment where power and heat management are concerns.

What Models Can It Run?

The ThinkStation PGX is specifically advertised as a platform for hardware for running 200B parameter models. This is made possible through the 128GB VRAM pool and the use of 4-bit or 5-bit quantization (GGUF, EXL2, or AWQ formats).

LLM Compatibility & Quantization:

Llama 3.1 405B: This model is too large for the 128GB VRAM even at heavy quantization. However, the PGX is the "sweet spot" for Llama 3.1 70B and DeepSeek-V3/R1 (distilled or heavily quantized versions).
200B Parameter Models: You can run models like Command R+ or Grok-1 (quantized) locally. At 4-bit quantization (Q4_K_M), a 200B model requires roughly 110-120GB of VRAM, fitting within the PGX's 128GB envelope with room for KV cache.
Mixtral 8x22B / Qwen 2.5 72B: These models run exceptionally well on this hardware. You can run these at high precision (FP16 or Q8) or use lower quantization to enable massive context windows (up to 128k tokens).

Expected Tokens Per Second (TPS):

While actual throughput depends on the specific quantization and optimization (TensorRT-LLM vs. llama.cpp), users can expect:

7B - 14B Models (Gemma 2, Mistral): Blistering speeds, likely exceeding 100+ TPS.
70B Models (Llama 3.1): Smooth, real-time interaction, typically ranging between 15-25 TPS.
200B Models: Functional but slower, likely in the 2-5 TPS range, suitable for batch processing or non-latency-sensitive agents.

Use Cases & Target Audience

The Lenovo ThinkStation PGX - 1TB is not a general-purpose gaming rig or a standard office PC. It is a specialized tool for:

Local AI Agent Development

Developers building agentic workflows using frameworks like LangChain, CrewAI, or AutoGPT need reliable, local inference to iterate quickly without incurring cloud API costs or latency. The 128GB VRAM allows for running multiple models simultaneously (e.g., a primary reasoning model like DeepSeek-R1 alongside a smaller embedding model).

Enterprise Edge AI

For industries like healthcare, finance, or defense, the PGX acts as a secure node for local LLM deployment. Its small form factor (150mm x 150mm) allows it to be tucked away in server closets or integrated into medical imaging carts to provide real-time data synthesis without sending data to the cloud.

Fine-Tuning Sandbox

While the 29.71 TFLOPS of FP16 performance is modest compared to a full DGX H100, the 128GB VRAM makes it an excellent "sandbox" for Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA and QLoRA. Researchers can prototype fine-tuning runs on large models locally before scaling to a cluster.

How It Compares

When evaluating the Lenovo ThinkStation PGX - 1TB vs. competitors, the primary alternatives are the Apple Mac Studio (M2/M3 Ultra) and Custom Multi-GPU Linux Desktops (Dual RTX 3090/4090s).

Lenovo ThinkStation PGX vs. Apple Mac Studio (128GB+ RAM)

The Mac Studio is the closest competitor in terms of a compact, high-VRAM "AI PC."

Software Stack: The PGX wins here for practitioners. It runs the native NVIDIA AI software stack (CUDA, TensorRT, Triton Inference Server), which is the industry standard. Apple’s MLX is improving but still lags in terms of broad community support for new model architectures.
Price: At $4,100, the PGX is competitively priced against a Mac Studio configured with 128GB of RAM.

Lenovo ThinkStation PGX vs. Dual RTX 4090 Build

A custom PC with two RTX 4090s provides 48GB of VRAM and significantly higher raw compute (TFLOPS).

VRAM Capacity: The PGX offers 128GB, nearly triple the VRAM of a dual-4090 setup. For running 200B parameter models, the PGX is the clear winner.
Form Factor & Power: The PGX consumes 140W and fits in a small box. A dual-4090 build requires a 1200W+ PSU, a massive chassis, and significant cooling infrastructure.

The Lenovo ThinkStation PGX - 1TB is the best AI chip for local deployment when the priority is model size and ecosystem compatibility over raw floating-point speed. It is a purpose-built "inference appliance" that simplifies the path from development to local production.

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6