
Lenovo's flagship corporate GB10 workstation offering high storage capacity and native QSFP scaling for model expansion.
The Lenovo ThinkStation PGX - 4TB represents a shift in enterprise-grade AI hardware, moving away from traditional x86/Discrete-GPU architectures toward a unified system-on-chip (SoC) design. Built around the NVIDIA GB10 Grace Blackwell Superchip, this workstation is engineered for high-density AI inference and edge deployment where power efficiency and VRAM capacity are more critical than raw desktop gaming performance.
While categorized under AI PCs & Laptops, the PGX is a mini-tower workstation that functions more like a localized node of a data center. It is designed specifically for organizations that need to run large language models (LLMs) and agentic workflows locally to maintain data sovereignty or reduce latency. With 128GB of unified memory and a compact 1.2kg form factor, it competes directly with high-end Mac Studio (M2/M3 Ultra) configurations and specialized NVIDIA RTX 6000 Ada workstations, but at a more aggressive power envelope of just 140W.
For AI engineers, the ThinkStation PGX matters because it solves the VRAM bottleneck. Most consumer hardware caps out at 16GB or 24GB, forcing practitioners to use heavy quantization or multi-GPU clusters. The PGX provides a unified 128GB pool, allowing for the deployment of massive models in a footprint no larger than a standard desktop router.
The core of the ThinkStation PGX is the NVIDIA GB10 architecture, which bridges the gap between the professional RTX line and the H100/B100 data center GPUs.
The 128GB of LPDDR5x unified memory is the standout feature for AI development. Unlike traditional PC builds where the CPU and GPU have separate memory pools, the GB10 utilizes a 256-bit memory interface with a bandwidth of 273 GB/s. While this is lower than the bandwidth found on H100 systems, it is optimized for high-capacity inference. For local LLM performance, memory bandwidth is the primary governor of tokens per second (t/s); at 273 GB/s, the PGX delivers a fluid experience for single-user or small-team inference.
The system runs on a 20-core ARM architecture (10x Cortex-X925 and 10x Cortex-A725). By moving away from x86, Lenovo and NVIDIA have achieved a 140W TDP. This is significantly lower than a comparable Intel/NVIDIA dual-RTX 3090 setup, which would pull over 700W. For edge AI deployment, this thermal efficiency allows the PGX to operate in environments without specialized data center cooling.
The 128GB VRAM capacity changes the math for local AI. Practitioners are no longer limited to "small" 7B or 8B models. The Lenovo ThinkStation PGX - 4TB is a "Goldilocks" machine for the current generation of open-weights models.
For the best quality-to-speed tradeoff, Q6_K or Q8_0 quantization is the sweet spot for this hardware. While 4-bit quantization is common on consumer cards to save space, the 128GB buffer on the PGX allows you to use higher-bit weights, significantly reducing "hallucination" rates and improving reasoning capabilities in complex agentic workflows.
The 128GB pool is ideal for multimodal models like Llava v1.6 or Chameleon, where both image embeddings and text weights must reside in memory. Additionally, for RAG (Retrieval-Augmented Generation) tasks, the PGX can handle massive KV caches, allowing for 100k+ token context windows without OOM (Out of Memory) errors.
For developers building agentic workflows (using frameworks like LangChain, CrewAI, or AutoGPT), the PGX serves as a reliable local execution environment. It can host the LLM, a vector database (Milvus/Weaviate), and the embedding model simultaneously without memory contention.
The "Corporate" branding isn't just marketing; the PGX is built for the "black box" deployment of AI. It is ideal for:
While the PGX can handle light PEFT (Parameter-Efficient Fine-Tuning) such as LoRA or QLoRA, it is primarily an inference-first machine. Teams looking to do full-scale pre-training should look toward NVIDIA HGX clusters. However, for serving models to a department of 20-50 users, the PGX is a cost-effective alternative to expensive cloud API bills.
The Mac Studio is the closest competitor in terms of unified memory. However, the PGX holds the advantage in the AI ecosystem.
A dual-RTX 6000 Ada setup provides 96GB of VRAM and higher raw TFLOPS, but at a significantly higher price point (approx. $14,000+) and power draw (600W+). The PGX provides more VRAM (128GB) and a more stable, integrated thermal solution for $5,079, making it the better choice for organizations prioritizing "set-and-forget" reliability over raw gaming-derived horsepower.
For practitioners looking for the best hardware for local AI agents in 2026, the ThinkStation PGX offers a unique balance: it provides the VRAM of a data center card with the footprint and power draw of a standard workstation.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 40.8 tok/s | 5.4 GB | |
BAGEL-7B-MoTBytedance | 14B(7B active) | AA | 45.9 tok/s | 4.8 GB | |
Stable Diffusion 3.5 LargeStability AI | 8.1B | AA | 40.2 tok/s | 5.5 GB | |
e5-mistral-7b-instructintfloat (Microsoft Research) | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
SFR-Embedding-MistralSalesforce | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
Linq-Embed-MistralLinq AI Research | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
GritLM-7BGritLM (Contextual AI) | 7.2B | AA | 45.3 tok/s | 4.9 GB | |
llama-embed-nemotron-8bNVIDIA | 7.5B | AA | 45.9 tok/s | 4.8 GB | |
F2LLM-v2-8BCodeFuse-AI (Ant Group) | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Octen-Embedding-8BOcten AI | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Qwen3-Embedding-8BQwen/Alibaba | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab) | 7.1B | AA | 49.0 tok/s | 4.5 GB | |
| 8B | AA | 38.8 tok/s | 5.7 GB | ||
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
FLUX.2 [klein] 9BBlack Forest Labs | 9B | AA | 36.5 tok/s | 6.0 GB | |
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 45.9 tok/s | 4.8 GB | |
Phi-4-multimodal-instructMicrosoft | 5.6B | AA | 55.9 tok/s | 3.9 GB | |
Z-Image-TurboAlibaba | 6B | AA | 52.6 tok/s | 4.2 GB | |
BOOM_4B_v1ICT-CAS TIME / Querit | 4B | AA | 81.2 tok/s | 2.7 GB | |
F2LLM-v2-4BCodeFuse-AI (Ant Group) | 4B | AA | 81.2 tok/s | 2.7 GB | |
Qwen3-Embedding-4BQwen/Alibaba | 4B | AA | 81.2 tok/s | 2.7 GB | |
FLUX.2 [klein] 4BBlack Forest Labs | 4B | AA | 74.5 tok/s | 3.0 GB | |
Mochi 1 PreviewGenmo AI | 10B | AA | 33.2 tok/s | 6.6 GB | |
| 11.8B | AA | 30.9 tok/s | 7.1 GB |