
A 5U enterprise-grade tower delivering 20 PFLOPS of AI compute, featuring closed-loop liquid cooling and dedicated BMC management.
The SuperMicro Super AI Station is a 5U enterprise-grade tower designed to bridge the gap between consumer-grade workstations and rack-mounted data center infrastructure. Built on the NVIDIA Blackwell Ultra (B300) architecture, this system is engineered specifically for the development and deployment of autonomous agents and frontier-class models. By packaging the GB300 Grace Blackwell Ultra Superchip into a deskside form factor, SuperMicro is targeting AI engineers and researchers who require data-center-level performance without the infrastructure requirements of a traditional server room.
Unlike standard workstations that rely on PCIe-based GPU expansion, the Super AI Station operates as a unified "AI Factory" in a box. It is positioned as the premier hardware for local AI agents, offering the compute density required for long-running autonomous workflows. While it competes with high-end Mac Studio configurations for developer mindshare, its raw throughput and 748GB of unified VRAM place it in a category of its own, outclassing even multi-GPU RTX 6000 Ada builds in memory bandwidth and total parameter capacity.
For AI inference, the most critical bottleneck is often memory bandwidth and VRAM capacity. The Super AI Station addresses these directly with 748GB of high-bandwidth memory and a staggering 7100 GB/s of memory bandwidth. To put this in perspective, this is roughly 7x the bandwidth of a top-tier consumer GPU, which translates directly into superior tokens-per-second (TPS) for large-batch inference and high-concurrency agentic workloads.
The Super AI Station features a 1600W Titanium Level power supply with 94% efficiency, allowing it to run on a conventional 20A circuit. The integrated closed-loop liquid cooling system is a critical design choice for practitioners; it enables the system to maintain peak performance during sustained training or long-context inference while remaining quiet enough for a shared office environment. Additionally, the inclusion of a dedicated BMC (Baseboard Management Controller) allows for enterprise-level remote management, a feature typically absent from consumer AI PCs.
The 748GB VRAM capacity makes the Super AI Station one of the few deskside solutions capable of running 1-trillion parameter models locally. This is a significant milestone for privacy-conscious enterprises and researchers who cannot leak proprietary data to cloud-based APIs.
The massive memory overhead is particularly beneficial for long-context tasks (e.g., analyzing 100k+ token documents). While consumer cards like the RTX 4090 (24GB) struggle with context window expansion, the Super AI Station allows for massive context windows in models like Qwen 2.5 or Mixtral 8x22B without hitting OOM (Out of Memory) errors. For multimodal workloads, such as running local video-to-text or high-resolution image generation (Stable Diffusion 3 / Flux.1), the 5000 TFLOPS of FP16 performance ensures near-instantaneous generation.
The SuperMicro Super AI Station is not a general-purpose workstation; it is a specialized tool for high-throughput AI production.
The primary use case for this hardware is running autonomous agents. Using frameworks like NVIDIA NemoClaw, developers can deploy agents that operate 24/7. The high memory bandwidth allows the system to handle the rapid-fire reasoning loops required for agents to browse the web, write code, and execute tasks without the latency of cloud round-trips.
While primarily marketed for inference, the 5000 TFLOPS of FP16 compute makes this an exceptional machine for fine-tuning. ML researchers can perform full-parameter fine-tuning on 70B models or extensive LoRA training on 400B+ models locally. This is a game-changer for teams working with sensitive datasets in healthcare, finance, or defense.
For small AI startups or research labs, the Super AI Station can serve as a centralized inference server. The dedicated BMC and enterprise networking capabilities allow it to be partitioned or shared across a team, providing a "private cloud" experience where multiple developers can run inference against shared local weights.
When evaluating the Super AI Station, practitioners typically look at three alternatives: high-end Mac Studios, DIY multi-GPU builds, and enterprise rack servers.
The Mac Studio is the popular choice for local LLMs due to its unified memory (up to 192GB). However, the Super AI Station offers nearly 4x the VRAM and significantly higher compute throughput. While the Mac is a silent consumer device, the Super AI Station is a production-grade machine capable of running 1T models that the Mac simply cannot fit in memory.
A workstation with four RTX 6000 Ada cards provides 192GB of VRAM. To match the 748GB of the Super AI Station, you would need multiple linked workstations, which introduces massive bottlenecks in NVLink or PCIe communication. The Super AI Station’s Blackwell Ultra architecture provides a unified memory pool and bandwidth that a multi-GPU PCIe setup cannot replicate, making it the superior choice for high-throughput AI inference and large-scale model development.
The Super AI Station brings DGX-level performance to a deskside form factor. While a DGX is designed for a data center rack with 3-phase power and industrial cooling, the Super AI Station is optimized for the "Edge" and "Prosumer" markets, offering a plug-and-play experience on standard NEMA 5-20 outlets. It is the logical choice for teams that need data center power but lack the facilities to house a traditional server.
Llama 4 MaverickMeta | 400B(17B active) | SS | 39.1 tok/s | 146.4 GB | |
| 70B | SS | 50.7 tok/s | 112.8 GB | ||
| 70B | SS | 50.7 tok/s | 112.8 GB | ||
Nvidia Nemotron 3 SuperNVIDIA | 120B(12B active) | SS | 55.2 tok/s | 103.5 GB | |
GLM-5Z.ai | 744B(40B active) | SS | 65.2 tok/s | 87.7 GB | |
GLM-5.1Z.ai | 744B(40B active) | SS | 65.2 tok/s | 87.7 GB | |
Kimi K2.6Moonshot AI | 1000B(32B active) | SS | 66.3 tok/s | 86.2 GB | |
Kimi K2 Instruct 0905Moonshot AI | 1000B(32B active) | SS | 67.6 tok/s | 84.6 GB | |
Kimi K2 ThinkingMoonshot AI | 1000B(32B active) | SS | 67.6 tok/s | 84.6 GB | |
Kimi K2.5Moonshot AI | 1000B(32B active) | SS | 67.6 tok/s | 84.6 GB | |
GLM-4.6Z.ai | 355B(32B active) | SS | 81.3 tok/s | 70.3 GB | |
Mistral Large 3 675BMistral AI | 675B(41B active) | SS | 86.3 tok/s | 66.3 GB | |
DeepSeek-V3DeepSeek | 671B(37B active) | SS | 95.5 tok/s | 59.8 GB | |
DeepSeek-R1DeepSeek | 671B(37B active) | SS | 95.5 tok/s | 59.8 GB | |
DeepSeek-V3.1DeepSeek | 671B(37B active) | SS | 95.5 tok/s | 59.8 GB | |
DeepSeek-V3.2DeepSeek | 685B(37B active) | SS | 95.5 tok/s | 59.8 GB | |
GLM-4.5Z.ai | 355B(32B active) | SS | 110.3 tok/s | 51.8 GB | |
GLM-4.7Z.ai | 358B(32B active) | SS | 108.6 tok/s | 52.6 GB | |
Kimi K2 InstructMoonshot AI | 1000B(32B active) | SS | 110.3 tok/s | 51.8 GB | |
| 70B | SS | 125.1 tok/s | 45.7 GB | ||
Qwen3.5-397B-A17BAlibaba Cloud (Qwen) | 397B(17B active) | SS | 124.2 tok/s | 46.0 GB | |
Llama 2 70B ChatMeta | 70B | SS | 131.7 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | SS | 131.2 tok/s | 43.6 GB | |
Qwen 3.5 OmniAlibaba Cloud | 397B(17B active) | SS | 126.5 tok/s | 45.2 GB | |
Qwen3-235B-A22BAlibaba Cloud (Qwen) | 235B(22B active) | SS | 157.3 tok/s | 36.3 GB |