No image

HP ZGX Fury AI Station

An upcoming, rack-ready deskside supercomputer utilizing the GB300 chip for trillion-parameter autonomous AI agent deployment.

AI PCs & LaptopsAnnounced

Best for LLMsEnterpriseProduction Ready

Quick Specs

VRAM748 GB

FP165000 TFLOPS

INT8330 TOPS

TDP1400 W

Memory BW7100 GB/s

Max Params1T

GPU ArchitectureBlackwell Ultra (B300)

InterconnectNVLink-C2C

Network InterfaceConnectX-8 SuperNIC

Specifications

The HP ZGX Fury AI Station represents a significant shift in AI infrastructure, moving frontier-scale compute from the data center to a deskside, rack-ready form factor. Developed by HP and powered by the NVIDIA Blackwell Ultra (B300) architecture, this system is designed for organizations and researchers who need to deploy trillion-parameter models locally. It bridges the gap between high-end workstations and enterprise server clusters, offering a "deskside supercomputer" experience for teams building autonomous agentic workflows.

In the current market, the ZGX Fury competes in the ultra-high-end tier of AI PCs and Laptops, though its performance profile aligns more closely with specialized AI servers like the NVIDIA DGX series or custom-built Lambda Labs configurations. By utilizing the GB300 Grace Blackwell Ultra Desktop Superchip, HP has prioritized coherent memory and massive bandwidth, specifically targeting the high-token-consumption demands of local AI agents and large-scale inference.

AI Performance & Specifications

For AI engineers, the most critical specification of the HP ZGX Fury is its 748 GB of unified VRAM. This is facilitated by the NVLink-C2C interconnect, which allows the CPU and GPU to share a high-speed, coherent memory pool. This architecture eliminates the traditional PCIe bottleneck, enabling the system to handle massive datasets and model weights with minimal latency.

Memory Bandwidth: At 7100 GB/s, the ZGX Fury offers bandwidth that is leagues ahead of consumer-grade hardware (like the RTX 5090) and even outperforms previous-generation H100 configurations in specific local-link scenarios. This bandwidth is the primary driver for high tokens-per-second (TPS) during the autoregressive decoding phase of LLM inference.
Compute Throughput: The system delivers 5000 TFLOPS of FP16 performance, making it a powerhouse for half-precision fine-tuning and inference. For quantized workloads, the 330 TOPS of INT8 performance ensures that even highly compressed models run with extreme efficiency.
Networking & Integration: The inclusion of the ConnectX-8 SuperNIC signals that this machine is designed for production-ready environments. It can easily integrate into larger clusters for distributed training or serve as a high-throughput inference node within a corporate network.
Power & Thermal: With a TDP of 1400 W, the ZGX Fury requires dedicated power considerations. However, it is designed for a deskside environment, meaning HP has engineered it to handle this thermal load without the acoustic profile of a 1U/2U rack server.

What Models Can It Run?

The HP ZGX Fury AI Station is one of the few single-node deskside systems capable of running 1-trillion parameter models. This makes it the premier choice for practitioners working with frontier-level research models or massive internal ensembles.

Large Language Models (LLMs)

Llama 3.1 405B: The ZGX Fury can run the full-parameter FP16 version of Llama 3.1 405B with room to spare for massive KV cache (context). At 4-bit or 8-bit quantization, you can run multiple instances of this model simultaneously for agentic workflows.
DeepSeek-V3 / DeepSeek-R1: With 748 GB of VRAM, the ZGX Fury comfortably fits the 671B parameter DeepSeek models at high precision (FP16 or BF16), maintaining high inference speeds that are usually only possible on multi-node H100 clusters.
Qwen 2.5 72B & Mixtral 8x22B: These models will run at lightning speed. Expect hundreds of tokens per second, making this hardware ideal for real-time applications, such as live voice agents or high-volume document processing.

Quantization & Context

The "sweet spot" for this hardware is running 400B+ parameter models at FP16 or BF16 precision. Unlike consumer hardware that requires 4-bit quantization (EXL2/GGUF) to fit large models, the ZGX Fury allows engineers to maintain maximum model weights for higher reasoning accuracy. Additionally, the massive VRAM allows for long-context tasks (128k+ tokens) without offloading to slower system RAM.

Use Cases & Target Audience

The HP ZGX Fury is not a consumer machine; it is a production-grade tool for specialized AI development.

Enterprise AI Teams: For companies that cannot move sensitive data to the cloud due to compliance (GDPR, HIPAA), the ZGX Fury provides a local "sandbox" that matches cloud-tier performance for fine-tuning and inference.
Agentic Workflow Developers: Local AI agents require high token throughput to "think" through multiple steps. The 7100 GB/s bandwidth ensures that agentic loops (Reasoning -> Tool Use -> Verification) happen in near real-time.
ML Researchers: The ability to fine-tune models with 100B+ parameters locally significantly speeds up the R&D cycle, as researchers don't have to wait for cloud spin-up times or deal with egress costs.
Edge Deployment: In scenarios like localized command centers or private medical facilities, the ZGX Fury acts as a localized inference server capable of serving an entire organization's LLM needs from a single deskside unit.

How It Compares

When evaluating the HP ZGX Fury AI Station for AI development, it is helpful to compare it against the most common alternatives:

HP ZGX Fury vs. Multi-GPU RTX 6000 Ada Workstations: A quad-RTX 6000 Ada setup provides roughly 192 GB of VRAM. The ZGX Fury offers nearly 4x the VRAM in a unified pool. While a multi-GPU setup is excellent for 70B models, the ZGX Fury is the only choice if your workload scales toward 400B or 1T parameter models.
HP ZGX Fury vs. Mac Studio (M2/M3 Ultra): The Mac Studio is a popular choice for local LLMs due to its unified memory (up to 192GB). However, the ZGX Fury offers significantly higher raw compute (5000 TFLOPS FP16 vs. the Mac's much lower TFLOPS) and nearly 4x the memory capacity. For production-grade inference and fine-tuning, the ZGX Fury is a much more capable, albeit more power-hungry, professional tool.
HP ZGX Fury vs. Cloud H100/B200 Instances: While cloud instances offer infinite scale, the ZGX Fury provides fixed costs. For teams running 24/7 inference or iterative fine-tuning, the ZGX Fury typically pays for itself within 8-12 months compared to the high hourly rates of Blackwell-class cloud compute.

The HP ZGX Fury AI Station is the definitive choice for practitioners who need the highest possible VRAM capacity and memory bandwidth available in a deskside format for the next generation of autonomous AI agents.

Compatible AI Models

Hide F tierOnly popular models

148 models


Llama 4 MaverickMeta	400B(17B active)	SS	39.1 tok/s	146.4 GB
Llama 3.1 70B InstructMeta	70B	SS	50.7 tok/s	112.8 GB
Llama 3.3 70B InstructMeta	70B	SS	50.7 tok/s	112.8 GB
Nvidia Nemotron 3 SuperNVIDIA	120B(12B active)	SS	55.2 tok/s	103.5 GB
GLM-5Z.ai	744B(40B active)	SS	65.2 tok/s	87.7 GB
GLM-5.1Z.ai	744B(40B active)	SS	65.2 tok/s	87.7 GB
Kimi K2.6Moonshot AI	1000B(32B active)	SS	66.3 tok/s	86.2 GB
Kimi K2 Instruct 0905Moonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
Kimi K2 ThinkingMoonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
Kimi K2.5Moonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
GLM-4.6Z.ai	355B(32B active)	SS	81.3 tok/s	70.3 GB
Mistral Large 3 675BMistral AI	675B(41B active)	SS	86.3 tok/s	66.3 GB
DeepSeek-V3DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-R1DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-V3.2DeepSeek	685B(37B active)	SS	95.5 tok/s	59.8 GB
GLM-4.5Z.ai	355B(32B active)	SS	110.3 tok/s	51.8 GB
GLM-4.7Z.ai	358B(32B active)	SS	108.6 tok/s	52.6 GB
Kimi K2 InstructMoonshot AI	1000B(32B active)	SS	110.3 tok/s	51.8 GB
Llama 3 70B InstructMeta	70B	SS	125.1 tok/s	45.7 GB
Qwen3.5-397B-A17BAlibaba Cloud (Qwen)	397B(17B active)	SS	124.2 tok/s	46.0 GB
Llama 2 70B ChatMeta	70B	SS	131.7 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	SS	131.2 tok/s	43.6 GB
Qwen 3.5 OmniAlibaba Cloud	397B(17B active)	SS	126.5 tok/s	45.2 GB
Qwen3-235B-A22BAlibaba Cloud (Qwen)	235B(22B active)	SS	157.3 tok/s	36.3 GB

Rows per page

Page 1 of 6

HP ZGX Fury AI Station

An upcoming, rack-ready deskside supercomputer utilizing the GB300 chip for trillion-parameter autonomous AI agent deployment.

AI PCs & LaptopsAnnounced

Best for LLMsEnterpriseProduction Ready

Quick Specs

VRAM748 GB

FP165000 TFLOPS

INT8330 TOPS

TDP1400 W

Memory BW7100 GB/s

Max Params1T

GPU ArchitectureBlackwell Ultra (B300)

InterconnectNVLink-C2C

Network InterfaceConnectX-8 SuperNIC

Specifications

AI Performance & Specifications

Memory Bandwidth: At 7100 GB/s, the ZGX Fury offers bandwidth that is leagues ahead of consumer-grade hardware (like the RTX 5090) and even outperforms previous-generation H100 configurations in specific local-link scenarios. This bandwidth is the primary driver for high tokens-per-second (TPS) during the autoregressive decoding phase of LLM inference.
Compute Throughput: The system delivers 5000 TFLOPS of FP16 performance, making it a powerhouse for half-precision fine-tuning and inference. For quantized workloads, the 330 TOPS of INT8 performance ensures that even highly compressed models run with extreme efficiency.
Networking & Integration: The inclusion of the ConnectX-8 SuperNIC signals that this machine is designed for production-ready environments. It can easily integrate into larger clusters for distributed training or serve as a high-throughput inference node within a corporate network.
Power & Thermal: With a TDP of 1400 W, the ZGX Fury requires dedicated power considerations. However, it is designed for a deskside environment, meaning HP has engineered it to handle this thermal load without the acoustic profile of a 1U/2U rack server.

What Models Can It Run?

Large Language Models (LLMs)

Llama 3.1 405B: The ZGX Fury can run the full-parameter FP16 version of Llama 3.1 405B with room to spare for massive KV cache (context). At 4-bit or 8-bit quantization, you can run multiple instances of this model simultaneously for agentic workflows.
DeepSeek-V3 / DeepSeek-R1: With 748 GB of VRAM, the ZGX Fury comfortably fits the 671B parameter DeepSeek models at high precision (FP16 or BF16), maintaining high inference speeds that are usually only possible on multi-node H100 clusters.
Qwen 2.5 72B & Mixtral 8x22B: These models will run at lightning speed. Expect hundreds of tokens per second, making this hardware ideal for real-time applications, such as live voice agents or high-volume document processing.

Quantization & Context

Use Cases & Target Audience

The HP ZGX Fury is not a consumer machine; it is a production-grade tool for specialized AI development.

Enterprise AI Teams: For companies that cannot move sensitive data to the cloud due to compliance (GDPR, HIPAA), the ZGX Fury provides a local "sandbox" that matches cloud-tier performance for fine-tuning and inference.
Agentic Workflow Developers: Local AI agents require high token throughput to "think" through multiple steps. The 7100 GB/s bandwidth ensures that agentic loops (Reasoning -> Tool Use -> Verification) happen in near real-time.
ML Researchers: The ability to fine-tune models with 100B+ parameters locally significantly speeds up the R&D cycle, as researchers don't have to wait for cloud spin-up times or deal with egress costs.
Edge Deployment: In scenarios like localized command centers or private medical facilities, the ZGX Fury acts as a localized inference server capable of serving an entire organization's LLM needs from a single deskside unit.

How It Compares

When evaluating the HP ZGX Fury AI Station for AI development, it is helpful to compare it against the most common alternatives:

HP ZGX Fury vs. Multi-GPU RTX 6000 Ada Workstations: A quad-RTX 6000 Ada setup provides roughly 192 GB of VRAM. The ZGX Fury offers nearly 4x the VRAM in a unified pool. While a multi-GPU setup is excellent for 70B models, the ZGX Fury is the only choice if your workload scales toward 400B or 1T parameter models.
HP ZGX Fury vs. Mac Studio (M2/M3 Ultra): The Mac Studio is a popular choice for local LLMs due to its unified memory (up to 192GB). However, the ZGX Fury offers significantly higher raw compute (5000 TFLOPS FP16 vs. the Mac's much lower TFLOPS) and nearly 4x the memory capacity. For production-grade inference and fine-tuning, the ZGX Fury is a much more capable, albeit more power-hungry, professional tool.
HP ZGX Fury vs. Cloud H100/B200 Instances: While cloud instances offer infinite scale, the ZGX Fury provides fixed costs. For teams running 24/7 inference or iterative fine-tuning, the ZGX Fury typically pays for itself within 8-12 months compared to the high hourly rates of Blackwell-class cloud compute.

Compatible AI Models

Hide F tierOnly popular models

148 models


Llama 4 MaverickMeta	400B(17B active)	SS	39.1 tok/s	146.4 GB
Llama 3.1 70B InstructMeta	70B	SS	50.7 tok/s	112.8 GB
Llama 3.3 70B InstructMeta	70B	SS	50.7 tok/s	112.8 GB
Nvidia Nemotron 3 SuperNVIDIA	120B(12B active)	SS	55.2 tok/s	103.5 GB
GLM-5Z.ai	744B(40B active)	SS	65.2 tok/s	87.7 GB
GLM-5.1Z.ai	744B(40B active)	SS	65.2 tok/s	87.7 GB
Kimi K2.6Moonshot AI	1000B(32B active)	SS	66.3 tok/s	86.2 GB
Kimi K2 Instruct 0905Moonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
Kimi K2 ThinkingMoonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
Kimi K2.5Moonshot AI	1000B(32B active)	SS	67.6 tok/s	84.6 GB
GLM-4.6Z.ai	355B(32B active)	SS	81.3 tok/s	70.3 GB
Mistral Large 3 675BMistral AI	675B(41B active)	SS	86.3 tok/s	66.3 GB
DeepSeek-V3DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-R1DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	SS	95.5 tok/s	59.8 GB
DeepSeek-V3.2DeepSeek	685B(37B active)	SS	95.5 tok/s	59.8 GB
GLM-4.5Z.ai	355B(32B active)	SS	110.3 tok/s	51.8 GB
GLM-4.7Z.ai	358B(32B active)	SS	108.6 tok/s	52.6 GB
Kimi K2 InstructMoonshot AI	1000B(32B active)	SS	110.3 tok/s	51.8 GB
Llama 3 70B InstructMeta	70B	SS	125.1 tok/s	45.7 GB
Qwen3.5-397B-A17BAlibaba Cloud (Qwen)	397B(17B active)	SS	124.2 tok/s	46.0 GB
Llama 2 70B ChatMeta	70B	SS	131.7 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	SS	131.2 tok/s	43.6 GB
Qwen 3.5 OmniAlibaba Cloud	397B(17B active)	SS	126.5 tok/s	45.2 GB
Qwen3-235B-A22BAlibaba Cloud (Qwen)	235B(22B active)	SS	157.3 tok/s	36.3 GB

Rows per page

Page 1 of 6

HP ZGX Fury AI Station

Quick Specs

Specifications

AI Performance & Specifications

What Models Can It Run?

Large Language Models (LLMs)

Quantization & Context

Use Cases & Target Audience

How It Compares

Compatible AI Models

Similar Products

Lenovo ThinkStation PGX - 4TB

Lenovo ThinkStation PGX - 1TB

HP ZGX Nano AI Station

GIGABYTE AI TOP ATOM

HP ZGX Fury AI Station

Quick Specs

Specifications

AI Performance & Specifications

What Models Can It Run?

Large Language Models (LLMs)

Quantization & Context

Use Cases & Target Audience

How It Compares

Compatible AI Models

Similar Products

Lenovo ThinkStation PGX - 4TB

Lenovo ThinkStation PGX - 1TB

HP ZGX Nano AI Station

GIGABYTE AI TOP ATOM