HP ZGX Nano AI Station

HP's scalable edge AI workstation incorporating sustainable chassis materials and the proprietary HP ZGX toolkit for model serving.

AI PCs & LaptopsIn Stock

EnterpriseEdge AI

Buy on Manufacturer$6,499Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

CPU Architecture20-Core Arm (10 Cortex-X925 + 10 Cortex-A725)

Power Supply240W external USB Type-C adapter

Dimensions150 x 150 x 51 mm

Network Interface10GbE, ConnectX-7, Wi-Fi 7, BT 5.4

Specifications

The HP ZGX Nano AI Station is a compact, enterprise-grade edge workstation designed to bridge the gap between local prototyping and data center deployment. Manufactured by HP, this unit is built specifically for AI engineers and ML researchers who require massive VRAM capacity without the footprint or power draw of a traditional rack-mounted server. By integrating the NVIDIA Grace Blackwell architecture into a 150mm square chassis, HP has created a dedicated "bridge" device that allows teams to develop on the same software stack used in high-end clusters.

In the current market, the ZGX Nano occupies a unique niche. It is more capable than high-end consumer desktops for massive model loading, yet more efficient than traditional multi-GPU workstations. It competes directly with Apple’s Mac Studio (M2/M3 Ultra) and AMD’s Strix Halo-based mini-PCs. While consumer hardware often prioritizes gaming or creative workflows, the ZGX Nano is a purpose-built AI PC for running AI models locally, featuring a proprietary ZGX toolkit for streamlined model serving and experiment tracking.

AI Performance & Specifications

For AI practitioners, the most critical metric of the HP ZGX Nano AI Station is its unified memory architecture. With 128 GB of VRAM, this machine bypasses the standard PCIe bottlenecks associated with multi-GPU setups in small form factors.

Compute and Throughput

FP16 Performance: 29.71 TFLOPS
INT8 Performance: 250 TOPS
Memory Bandwidth: 273 GB/s
TDP: 140 W (Powered by a 240W USB Type-C adapter)

While the 273 GB/s bandwidth is lower than what you would find on a dedicated H100 or a top-tier RTX 4090, it is optimized for the 20-Core Arm architecture (10 Cortex-X925 + 10 Cortex-A725). This configuration is designed for high-efficiency inference rather than raw training throughput. The 250 TOPS of INT8 performance makes it a powerhouse for quantized model execution, specifically leveraging NVIDIA's Blackwell-specific optimizations for FP4 and FP8 data formats.

Networking and Connectivity

Edge AI workloads often require rapid data ingestion. The ZGX Nano is equipped with a 10GbE ConnectX-7 interface, ensuring that model weights can be pulled from local registries or datasets at wire speed. It also supports Wi-Fi 7 and Bluetooth 5.4 for flexible edge deployments where hardwired infrastructure may be limited.

What Models Can It Run?

The HP ZGX Nano AI Station VRAM for large language models is its primary selling point. A 128GB buffer allows for the local execution of models that were previously restricted to cloud H100 instances.

Large Language Models (LLMs)

The ZGX Nano is rated for models up to 200B parameters. This is a significant threshold for local AI agents and RAG (Retrieval-Augmented Generation) workflows.

Llama 3.1 405B: Cannot run at full precision, but can be loaded at extreme quantization (IQ2_XS) if the KV cache is managed aggressively.
Llama 3.1 70B: Runs comfortably at FP16 or Q8_0 with a massive context window (128k+ tokens) without spilling into system RAM.
DeepSeek-V3 / DeepSeek-R1: These MoE (Mixture of Experts) models fit effectively within the 128GB ceiling at 4-bit or 5-bit quantization, providing high-quality reasoning capabilities locally.
Qwen 2.5 72B: Can be run at high precision (BF16) with room to spare for multi-user inference sessions.

Inference Performance (Tokens Per Second)

Given the 273 GB/s bandwidth, users can expect the following estimated performance:

7B - 8B Models (Llama 3, Mistral): 45–60 tokens/second (near-instantaneous for agentic loops).
30B - 34B Models (Yi, Codestral): 12–18 tokens/second (ideal for real-time coding assistance).
70B+ Models: 5–8 tokens/second (sufficient for background processing and complex reasoning tasks).

Multimodal and Long-Context

The 128GB VRAM is a "sweet spot" for multimodal models like Llava or CogVLM, where both image encoders and large language decoders must reside in memory simultaneously. It also enables long-context tasks, such as analyzing entire codebases or long PDF documents, where the KV cache can grow to tens of gigabytes.

Use Cases & Target Audience

The HP ZGX Nano AI Station is not a general-purpose computer; it is a specialized tool for specific AI development personas.

Developers Building AI Agents

For those building autonomous agents, low latency for small models (8B) is critical. The ZGX Nano allows an agent to run local "thoughts" or "planning" steps at high speed while maintaining a massive local vector database in memory.

Enterprise Edge Deployment

The "Nano" designation and 150 x 150 x 51 mm dimensions make this ideal for on-premises deployment in sensitive environments (legal, medical, or industrial). The 10GbE interface and ConnectX-7 support allow it to act as a micro-inference server for an entire department.

ML Researchers

Researchers can use the HP ZGX toolkit to prototype models locally using the same CUDA-based ecosystem they will use in production. This "bridge to the data center" reduces the friction of moving from a local .ipynb file to a multi-node cluster.

Training vs. Inference

While the ZGX Nano is an inference powerhouse, it is not intended for training large models from scratch. It is, however, highly capable for Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA or QLoRA. The 128GB VRAM allows for fine-tuning 70B parameter models using larger batch sizes than any consumer GPU currently allows.

How It Compares

When evaluating the best hardware for local AI agents in 2026, the ZGX Nano faces competition from two main sides:

HP ZGX Nano vs. Apple Mac Studio (M2/M3 Ultra)

The Mac Studio offers higher memory bandwidth (up to 800 GB/s), which generally translates to faster tokens per second for LLMs. However, the HP ZGX Nano utilizes the NVIDIA software stack (CUDA, TensorRT, DGX Spark). For developers who need to deploy to NVIDIA-based clouds (AWS P5, Azure NDv5), the ZGX Nano provides a more consistent environment than the Apple Silicon/MLX ecosystem.

HP ZGX Nano vs. AMD Strix Halo Systems

AMD-based mini-PCs (like the Bosgame M5) offer a competitive price-to-VRAM ratio. However, the HP ZGX Nano’s enterprise-grade networking (ConnectX-7) and its 250 TOPS INT8 performance give it the edge in professional environments where reliability and integration with NVIDIA's enterprise tools are required.

HP ZGX Nano vs. RTX 4090 Workstations

A dual-RTX 4090 setup provides more raw TFLOPS and 48GB of high-speed VRAM. However, that setup consumes 900W+ and requires a full-tower chassis. The ZGX Nano provides nearly triple the VRAM (128GB) in a chassis that fits in a backpack, drawing only 140W under load. For practitioners who prioritize model size (parameter count) over raw generation speed, the ZGX Nano is the superior choice.

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6

HP ZGX Nano AI Station

HP's scalable edge AI workstation incorporating sustainable chassis materials and the proprietary HP ZGX toolkit for model serving.

AI PCs & LaptopsIn Stock

EnterpriseEdge AI

Buy on Manufacturer$6,499Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

CPU Architecture20-Core Arm (10 Cortex-X925 + 10 Cortex-A725)

Power Supply240W external USB Type-C adapter

Dimensions150 x 150 x 51 mm

Network Interface10GbE, ConnectX-7, Wi-Fi 7, BT 5.4

Specifications

AI Performance & Specifications

Compute and Throughput

FP16 Performance: 29.71 TFLOPS
INT8 Performance: 250 TOPS
Memory Bandwidth: 273 GB/s
TDP: 140 W (Powered by a 240W USB Type-C adapter)

Networking and Connectivity

What Models Can It Run?

Large Language Models (LLMs)

The ZGX Nano is rated for models up to 200B parameters. This is a significant threshold for local AI agents and RAG (Retrieval-Augmented Generation) workflows.

Llama 3.1 405B: Cannot run at full precision, but can be loaded at extreme quantization (IQ2_XS) if the KV cache is managed aggressively.
Llama 3.1 70B: Runs comfortably at FP16 or Q8_0 with a massive context window (128k+ tokens) without spilling into system RAM.
DeepSeek-V3 / DeepSeek-R1: These MoE (Mixture of Experts) models fit effectively within the 128GB ceiling at 4-bit or 5-bit quantization, providing high-quality reasoning capabilities locally.
Qwen 2.5 72B: Can be run at high precision (BF16) with room to spare for multi-user inference sessions.

Inference Performance (Tokens Per Second)

Given the 273 GB/s bandwidth, users can expect the following estimated performance:

7B - 8B Models (Llama 3, Mistral): 45–60 tokens/second (near-instantaneous for agentic loops).
30B - 34B Models (Yi, Codestral): 12–18 tokens/second (ideal for real-time coding assistance).
70B+ Models: 5–8 tokens/second (sufficient for background processing and complex reasoning tasks).

Multimodal and Long-Context

Use Cases & Target Audience

The HP ZGX Nano AI Station is not a general-purpose computer; it is a specialized tool for specific AI development personas.

Developers Building AI Agents

Enterprise Edge Deployment

ML Researchers

Training vs. Inference

How It Compares

When evaluating the best hardware for local AI agents in 2026, the ZGX Nano faces competition from two main sides:

HP ZGX Nano vs. Apple Mac Studio (M2/M3 Ultra)

HP ZGX Nano vs. AMD Strix Halo Systems

HP ZGX Nano vs. RTX 4090 Workstations

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6

HP ZGX Nano AI Station

Quick Specs

Specifications

AI Performance & Specifications

Compute and Throughput

Networking and Connectivity

What Models Can It Run?

Large Language Models (LLMs)

Inference Performance (Tokens Per Second)

Multimodal and Long-Context

Use Cases & Target Audience

Developers Building AI Agents

Enterprise Edge Deployment

ML Researchers

Training vs. Inference

How It Compares

HP ZGX Nano vs. Apple Mac Studio (M2/M3 Ultra)

HP ZGX Nano vs. AMD Strix Halo Systems

HP ZGX Nano vs. RTX 4090 Workstations

Compatible AI Models

Similar Products

Lenovo ThinkStation PGX - 4TB

Lenovo ThinkStation PGX - 1TB

GIGABYTE AI TOP ATOM

Dell Pro Max with GB10

HP ZGX Nano AI Station

Quick Specs

Specifications

AI Performance & Specifications

Compute and Throughput

Networking and Connectivity

What Models Can It Run?

Large Language Models (LLMs)

Inference Performance (Tokens Per Second)

Multimodal and Long-Context

Use Cases & Target Audience

Developers Building AI Agents

Enterprise Edge Deployment

ML Researchers

Training vs. Inference

How It Compares

HP ZGX Nano vs. Apple Mac Studio (M2/M3 Ultra)

HP ZGX Nano vs. AMD Strix Halo Systems

HP ZGX Nano vs. RTX 4090 Workstations

Compatible AI Models

Similar Products

Lenovo ThinkStation PGX - 4TB

Lenovo ThinkStation PGX - 1TB

GIGABYTE AI TOP ATOM

Dell Pro Max with GB10