Lenovo

Lenovo ThinkStation PGX - 4TB

Name: Lenovo ThinkStation PGX - 4TB
Brand: Lenovo
Price: 5079 USD
Availability: InStock

Lenovo's flagship corporate GB10 workstation offering high storage capacity and native QSFP scaling for model expansion.

AI PCs & LaptopsIn Stock

EnterpriseEdge AI

Buy on Manufacturer$5,079Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

CPU Architecture20-Core Arm (10 Cortex-X925 + 10 Cortex-A725)

Form FactorMini-tower

Dimensions150 x 150 x 50.5 mm

Weight1.2 kg

Specifications

High-Density Edge Inference: The GB10 Grace Blackwell Workstation

The Lenovo ThinkStation PGX - 4TB represents a shift in enterprise-grade AI hardware, moving away from traditional x86/Discrete-GPU architectures toward a unified system-on-chip (SoC) design. Built around the NVIDIA GB10 Grace Blackwell Superchip, this workstation is engineered for high-density AI inference and edge deployment where power efficiency and VRAM capacity are more critical than raw desktop gaming performance.

While categorized under AI PCs & Laptops, the PGX is a mini-tower workstation that functions more like a localized node of a data center. It is designed specifically for organizations that need to run large language models (LLMs) and agentic workflows locally to maintain data sovereignty or reduce latency. With 128GB of unified memory and a compact 1.2kg form factor, it competes directly with high-end Mac Studio (M2/M3 Ultra) configurations and specialized NVIDIA RTX 6000 Ada workstations, but at a more aggressive power envelope of just 140W.

For AI engineers, the ThinkStation PGX matters because it solves the VRAM bottleneck. Most consumer hardware caps out at 16GB or 24GB, forcing practitioners to use heavy quantization or multi-GPU clusters. The PGX provides a unified 128GB pool, allowing for the deployment of massive models in a footprint no larger than a standard desktop router.

AI Performance & Specifications

The core of the ThinkStation PGX is the NVIDIA GB10 architecture, which bridges the gap between the professional RTX line and the H100/B100 data center GPUs.

VRAM and Memory Architecture

The 128GB of LPDDR5x unified memory is the standout feature for AI development. Unlike traditional PC builds where the CPU and GPU have separate memory pools, the GB10 utilizes a 256-bit memory interface with a bandwidth of 273 GB/s. While this is lower than the bandwidth found on H100 systems, it is optimized for high-capacity inference. For local LLM performance, memory bandwidth is the primary governor of tokens per second (t/s); at 273 GB/s, the PGX delivers a fluid experience for single-user or small-team inference.

Compute Throughput

INT8 Performance: 250 TOPS. This makes the PGX a powerhouse for quantized inference, particularly for vision models and optimized LLMs.
FP16 Performance: 29.71 TFLOPS. This provides sufficient headroom for fine-tuning smaller models (7B-14B) or running high-precision scientific simulations.
Scaling via QSFP: Unique to this form factor, the PGX includes native QSFP scaling ports. This allows developers to interconnect two PGX systems to effectively double the VRAM and compute, supporting models up to 405B parameters.

CPU and Power Efficiency

The system runs on a 20-core ARM architecture (10x Cortex-X925 and 10x Cortex-A725). By moving away from x86, Lenovo and NVIDIA have achieved a 140W TDP. This is significantly lower than a comparable Intel/NVIDIA dual-RTX 3090 setup, which would pull over 700W. For edge AI deployment, this thermal efficiency allows the PGX to operate in environments without specialized data center cooling.

What Models Can It Run?

The 128GB VRAM capacity changes the math for local AI. Practitioners are no longer limited to "small" 7B or 8B models. The Lenovo ThinkStation PGX - 4TB is a "Goldilocks" machine for the current generation of open-weights models.

Large Language Models (LLMs)

Llama 3.1 70B: Can run at FP16 or BF16 precision with room to spare for massive context windows (up to 128k).
DeepSeek-V3 / DeepSeek-R1: These Mixture-of-Experts (MoE) models fit comfortably when using 4-bit (Q4_K_M) or 6-bit quantization.
Qwen 2.5 72B: Runs at high precision with exceptional throughput.
200B+ Parameter Models: The PGX is rated for models up to 200B parameters. This includes heavily quantized versions of Llama 3.1 405B (using two interconnected units) or Grok-1.

Quantization and Throughput

For the best quality-to-speed tradeoff, Q6_K or Q8_0 quantization is the sweet spot for this hardware. While 4-bit quantization is common on consumer cards to save space, the 128GB buffer on the PGX allows you to use higher-bit weights, significantly reducing "hallucination" rates and improving reasoning capabilities in complex agentic workflows.

Multimodal and Long Context

The 128GB pool is ideal for multimodal models like Llava v1.6 or Chameleon, where both image embeddings and text weights must reside in memory. Additionally, for RAG (Retrieval-Augmented Generation) tasks, the PGX can handle massive KV caches, allowing for 100k+ token context windows without OOM (Out of Memory) errors.

Use Cases & Target Audience

Local AI Agents and RAG Pipelines

For developers building agentic workflows (using frameworks like LangChain, CrewAI, or AutoGPT), the PGX serves as a reliable local execution environment. It can host the LLM, a vector database (Milvus/Weaviate), and the embedding model simultaneously without memory contention.

Enterprise Edge AI

The "Corporate" branding isn't just marketing; the PGX is built for the "black box" deployment of AI. It is ideal for:

On-premise PHI/PII processing: Analyzing sensitive medical or legal data without cloud egress.
Manufacturing/Vision: Using the 250 TOPS INT8 performance for real-time defect detection on assembly lines.
Developer Workstations: Providing ML researchers a dedicated sandbox that doesn't rely on shared cluster resources.

Inference over Training

While the PGX can handle light PEFT (Parameter-Efficient Fine-Tuning) such as LoRA or QLoRA, it is primarily an inference-first machine. Teams looking to do full-scale pre-training should look toward NVIDIA HGX clusters. However, for serving models to a department of 20-50 users, the PGX is a cost-effective alternative to expensive cloud API bills.

How It Compares

Lenovo ThinkStation PGX vs. Apple Mac Studio (M3 Ultra, 128GB)

The Mac Studio is the closest competitor in terms of unified memory. However, the PGX holds the advantage in the AI ecosystem.

Software: The PGX runs NVIDIA DGX OS, providing native access to the CUDA ecosystem, TensorRT, and NIM (NVIDIA Inference Microservices).
Interconnect: The Mac Studio cannot be natively bridged to double its VRAM via QSFP; the PGX can.
Precision: NVIDIA's 5th Gen Tensor Cores offer superior support for FP4 and INT8 formats common in production AI.

Lenovo ThinkStation PGX vs. DIY Multi-GPU (e.g., 2x RTX 6000 Ada)

A dual-RTX 6000 Ada setup provides 96GB of VRAM and higher raw TFLOPS, but at a significantly higher price point (approx. $14,000+) and power draw (600W+). The PGX provides more VRAM (128GB) and a more stable, integrated thermal solution for $5,079, making it the better choice for organizations prioritizing "set-and-forget" reliability over raw gaming-derived horsepower.

For practitioners looking for the best hardware for local AI agents in 2026, the ThinkStation PGX offers a unique balance: it provides the VRAM of a data center card with the footprint and power draw of a standard workstation.

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6