
A corporate-targeted GB10 mini-tower equipped with a 1TB SSD to provide a cost-optimized platform for localized AI development.
The Lenovo ThinkStation PGX - 1TB is a specialized, small-form-factor workstation designed to bridge the gap between consumer-grade workstations and enterprise-grade data center infrastructure. Built around the NVIDIA GB10 Grace Blackwell architecture, this mini-tower is a dedicated platform for local AI development, fine-tuning, and inference. Unlike traditional workstations that rely on x86 architectures, the PGX utilizes a 20-core Arm-based CPU (Cortex-X925 and Cortex-A725) paired with Blackwell-generation Tensor cores, providing a high-efficiency environment for running large language models (LLMs) and agentic workflows at the edge.
For AI engineers and researchers, the ThinkStation PGX represents a move toward decentralized AI. It offers a "sandbox" environment that mirrors the NVIDIA software stack found in DGX systems but at a $4,100 MSRP. This makes it a primary contender for organizations that need to keep sensitive data local while maintaining the performance required for modern, high-parameter models. In the market for AI PCs and laptops, the PGX stands out by prioritizing VRAM capacity and unified memory over traditional desktop versatility.
The core value proposition of the Lenovo ThinkStation PGX - 1TB for AI is its 128GB of unified LPDDR5x memory. In the context of local AI, VRAM is the primary bottleneck; without enough of it, large models simply will not load or will fall back to system RAM, causing performance to crater.
The 273 GB/s memory bandwidth is a critical spec for token generation. While lower than high-end H100 or B200 data center GPUs, it is significantly higher than most consumer-grade laptops and matches or exceeds many high-end desktop configurations. This bandwidth ensures that the Lenovo ThinkStation PGX - 1TB AI inference performance remains stable even when processing long-context windows. Furthermore, the 140W TDP is remarkably efficient for a machine capable of 250 TOPS, making it suitable for continuous edge deployment where power and heat management are concerns.
The ThinkStation PGX is specifically advertised as a platform for hardware for running 200B parameter models. This is made possible through the 128GB VRAM pool and the use of 4-bit or 5-bit quantization (GGUF, EXL2, or AWQ formats).
While actual throughput depends on the specific quantization and optimization (TensorRT-LLM vs. llama.cpp), users can expect:
The Lenovo ThinkStation PGX - 1TB is not a general-purpose gaming rig or a standard office PC. It is a specialized tool for:
Developers building agentic workflows using frameworks like LangChain, CrewAI, or AutoGPT need reliable, local inference to iterate quickly without incurring cloud API costs or latency. The 128GB VRAM allows for running multiple models simultaneously (e.g., a primary reasoning model like DeepSeek-R1 alongside a smaller embedding model).
For industries like healthcare, finance, or defense, the PGX acts as a secure node for local LLM deployment. Its small form factor (150mm x 150mm) allows it to be tucked away in server closets or integrated into medical imaging carts to provide real-time data synthesis without sending data to the cloud.
While the 29.71 TFLOPS of FP16 performance is modest compared to a full DGX H100, the 128GB VRAM makes it an excellent "sandbox" for Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA and QLoRA. Researchers can prototype fine-tuning runs on large models locally before scaling to a cluster.
When evaluating the Lenovo ThinkStation PGX - 1TB vs. competitors, the primary alternatives are the Apple Mac Studio (M2/M3 Ultra) and Custom Multi-GPU Linux Desktops (Dual RTX 3090/4090s).
The Mac Studio is the closest competitor in terms of a compact, high-VRAM "AI PC."
A custom PC with two RTX 4090s provides 48GB of VRAM and significantly higher raw compute (TFLOPS).
The Lenovo ThinkStation PGX - 1TB is the best AI chip for local deployment when the priority is model size and ecosystem compatibility over raw floating-point speed. It is a purpose-built "inference appliance" that simplifies the path from development to local production.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 40.8 tok/s | 5.4 GB | |
BAGEL-7B-MoTBytedance | 14B(7B active) | AA | 45.9 tok/s | 4.8 GB | |
Stable Diffusion 3.5 LargeStability AI | 8.1B | AA | 40.2 tok/s | 5.5 GB | |
e5-mistral-7b-instructintfloat (Microsoft Research) | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
SFR-Embedding-MistralSalesforce | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
Linq-Embed-MistralLinq AI Research | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
GritLM-7BGritLM (Contextual AI) | 7.2B | AA | 45.3 tok/s | 4.9 GB | |
llama-embed-nemotron-8bNVIDIA | 7.5B | AA | 45.9 tok/s | 4.8 GB | |
F2LLM-v2-8BCodeFuse-AI (Ant Group) | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Octen-Embedding-8BOcten AI | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Qwen3-Embedding-8BQwen/Alibaba | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab) | 7.1B | AA | 49.0 tok/s | 4.5 GB | |
| 8B | AA | 38.8 tok/s | 5.7 GB | ||
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
FLUX.2 [klein] 9BBlack Forest Labs | 9B | AA | 36.5 tok/s | 6.0 GB | |
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 45.9 tok/s | 4.8 GB | |
Phi-4-multimodal-instructMicrosoft | 5.6B | AA | 55.9 tok/s | 3.9 GB | |
Z-Image-TurboAlibaba | 6B | AA | 52.6 tok/s | 4.2 GB | |
BOOM_4B_v1ICT-CAS TIME / Querit | 4B | AA | 81.2 tok/s | 2.7 GB | |
F2LLM-v2-4BCodeFuse-AI (Ant Group) | 4B | AA | 81.2 tok/s | 2.7 GB | |
Qwen3-Embedding-4BQwen/Alibaba | 4B | AA | 81.2 tok/s | 2.7 GB | |
FLUX.2 [klein] 4BBlack Forest Labs | 4B | AA | 74.5 tok/s | 3.0 GB | |
Mochi 1 PreviewGenmo AI | 10B | AA | 33.2 tok/s | 6.6 GB | |
| 11.8B | AA | 30.9 tok/s | 7.1 GB |