
HP's scalable edge AI workstation incorporating sustainable chassis materials and the proprietary HP ZGX toolkit for model serving.
The HP ZGX Nano AI Station is a compact, enterprise-grade edge workstation designed to bridge the gap between local prototyping and data center deployment. Manufactured by HP, this unit is built specifically for AI engineers and ML researchers who require massive VRAM capacity without the footprint or power draw of a traditional rack-mounted server. By integrating the NVIDIA Grace Blackwell architecture into a 150mm square chassis, HP has created a dedicated "bridge" device that allows teams to develop on the same software stack used in high-end clusters.
In the current market, the ZGX Nano occupies a unique niche. It is more capable than high-end consumer desktops for massive model loading, yet more efficient than traditional multi-GPU workstations. It competes directly with Apple’s Mac Studio (M2/M3 Ultra) and AMD’s Strix Halo-based mini-PCs. While consumer hardware often prioritizes gaming or creative workflows, the ZGX Nano is a purpose-built AI PC for running AI models locally, featuring a proprietary ZGX toolkit for streamlined model serving and experiment tracking.
For AI practitioners, the most critical metric of the HP ZGX Nano AI Station is its unified memory architecture. With 128 GB of VRAM, this machine bypasses the standard PCIe bottlenecks associated with multi-GPU setups in small form factors.
While the 273 GB/s bandwidth is lower than what you would find on a dedicated H100 or a top-tier RTX 4090, it is optimized for the 20-Core Arm architecture (10 Cortex-X925 + 10 Cortex-A725). This configuration is designed for high-efficiency inference rather than raw training throughput. The 250 TOPS of INT8 performance makes it a powerhouse for quantized model execution, specifically leveraging NVIDIA's Blackwell-specific optimizations for FP4 and FP8 data formats.
Edge AI workloads often require rapid data ingestion. The ZGX Nano is equipped with a 10GbE ConnectX-7 interface, ensuring that model weights can be pulled from local registries or datasets at wire speed. It also supports Wi-Fi 7 and Bluetooth 5.4 for flexible edge deployments where hardwired infrastructure may be limited.
The HP ZGX Nano AI Station VRAM for large language models is its primary selling point. A 128GB buffer allows for the local execution of models that were previously restricted to cloud H100 instances.
The ZGX Nano is rated for models up to 200B parameters. This is a significant threshold for local AI agents and RAG (Retrieval-Augmented Generation) workflows.
Given the 273 GB/s bandwidth, users can expect the following estimated performance:
The 128GB VRAM is a "sweet spot" for multimodal models like Llava or CogVLM, where both image encoders and large language decoders must reside in memory simultaneously. It also enables long-context tasks, such as analyzing entire codebases or long PDF documents, where the KV cache can grow to tens of gigabytes.
The HP ZGX Nano AI Station is not a general-purpose computer; it is a specialized tool for specific AI development personas.
For those building autonomous agents, low latency for small models (8B) is critical. The ZGX Nano allows an agent to run local "thoughts" or "planning" steps at high speed while maintaining a massive local vector database in memory.
The "Nano" designation and 150 x 150 x 51 mm dimensions make this ideal for on-premises deployment in sensitive environments (legal, medical, or industrial). The 10GbE interface and ConnectX-7 support allow it to act as a micro-inference server for an entire department.
Researchers can use the HP ZGX toolkit to prototype models locally using the same CUDA-based ecosystem they will use in production. This "bridge to the data center" reduces the friction of moving from a local .ipynb file to a multi-node cluster.
While the ZGX Nano is an inference powerhouse, it is not intended for training large models from scratch. It is, however, highly capable for Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA or QLoRA. The 128GB VRAM allows for fine-tuning 70B parameter models using larger batch sizes than any consumer GPU currently allows.
When evaluating the best hardware for local AI agents in 2026, the ZGX Nano faces competition from two main sides:
The Mac Studio offers higher memory bandwidth (up to 800 GB/s), which generally translates to faster tokens per second for LLMs. However, the HP ZGX Nano utilizes the NVIDIA software stack (CUDA, TensorRT, DGX Spark). For developers who need to deploy to NVIDIA-based clouds (AWS P5, Azure NDv5), the ZGX Nano provides a more consistent environment than the Apple Silicon/MLX ecosystem.
AMD-based mini-PCs (like the Bosgame M5) offer a competitive price-to-VRAM ratio. However, the HP ZGX Nano’s enterprise-grade networking (ConnectX-7) and its 250 TOPS INT8 performance give it the edge in professional environments where reliability and integration with NVIDIA's enterprise tools are required.
A dual-RTX 4090 setup provides more raw TFLOPS and 48GB of high-speed VRAM. However, that setup consumes 900W+ and requires a full-tower chassis. The ZGX Nano provides nearly triple the VRAM (128GB) in a chassis that fits in a backpack, drawing only 140W under load. For practitioners who prioritize model size (parameter count) over raw generation speed, the ZGX Nano is the superior choice.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 40.8 tok/s | 5.4 GB | |
BAGEL-7B-MoTBytedance | 14B(7B active) | AA | 45.9 tok/s | 4.8 GB | |
Stable Diffusion 3.5 LargeStability AI | 8.1B | AA | 40.2 tok/s | 5.5 GB | |
e5-mistral-7b-instructintfloat (Microsoft Research) | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
SFR-Embedding-MistralSalesforce | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
Linq-Embed-MistralLinq AI Research | 7.1B | AA | 45.9 tok/s | 4.8 GB | |
GritLM-7BGritLM (Contextual AI) | 7.2B | AA | 45.3 tok/s | 4.9 GB | |
llama-embed-nemotron-8bNVIDIA | 7.5B | AA | 45.9 tok/s | 4.8 GB | |
F2LLM-v2-8BCodeFuse-AI (Ant Group) | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Octen-Embedding-8BOcten AI | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
Qwen3-Embedding-8BQwen/Alibaba | 7.6B | AA | 46.5 tok/s | 4.7 GB | |
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab) | 7.1B | AA | 49.0 tok/s | 4.5 GB | |
| 8B | AA | 38.8 tok/s | 5.7 GB | ||
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
FLUX.2 [klein] 9BBlack Forest Labs | 9B | AA | 36.5 tok/s | 6.0 GB | |
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 45.9 tok/s | 4.8 GB | |
Phi-4-multimodal-instructMicrosoft | 5.6B | AA | 55.9 tok/s | 3.9 GB | |
Z-Image-TurboAlibaba | 6B | AA | 52.6 tok/s | 4.2 GB | |
BOOM_4B_v1ICT-CAS TIME / Querit | 4B | AA | 81.2 tok/s | 2.7 GB | |
F2LLM-v2-4BCodeFuse-AI (Ant Group) | 4B | AA | 81.2 tok/s | 2.7 GB | |
Qwen3-Embedding-4BQwen/Alibaba | 4B | AA | 81.2 tok/s | 2.7 GB | |
FLUX.2 [klein] 4BBlack Forest Labs | 4B | AA | 74.5 tok/s | 3.0 GB | |
Mochi 1 PreviewGenmo AI | 10B | AA | 33.2 tok/s | 6.6 GB | |
| 11.8B | AA | 30.9 tok/s | 7.1 GB |