
A flagship GB10 mini-PC engineered with a custom Vapor Chamber, enabling 10% faster sustained LLM token throughput than the reference design.
The MSI EdgeXpert - 13SUS is a high-density AI workstation designed to bridge the gap between consumer-grade hardware and enterprise-level data center clusters. Built on the NVIDIA GB10 (Grace Blackwell) architecture, this compact "black box" is engineered specifically for high-throughput local inference and edge AI deployment. Unlike standard reference designs, MSI has focused on thermal stability to solve the primary bottleneck of small-form-factor AI hardware: thermal throttling during long-context inference.
At an MSRP of $4,699, the EdgeXpert - 13SUS targets AI engineers, ML researchers, and teams deploying agentic workflows who require massive VRAM capacity without the footprint or noise of a 4U rack server. It competes directly with the NVIDIA DGX Spark and high-end Mac Studio configurations, offering a distinct advantage for those integrated into the CUDA ecosystem.
The defining characteristic of the MSI EdgeXpert - 13SUS for AI workloads is its 128GB of unified LPDDR5x memory. For practitioners, VRAM is the primary constraint for local LLM deployment; the 13SUS effectively removes this ceiling for the vast majority of open-source models.
The "13SUS" designation highlights MSI’s custom cooling solution. By implementing a high-end Vapor Chamber (VC) coupled with a three-heat-pipe module and large-area copper fins, the EdgeXpert maintains a 10% faster sustained token throughput compared to the standard GB10 reference design. In testing, the chassis runs up to 15°C cooler than the DGX Spark, preventing the frequency drops that typically plague mini-PCs during the generation of long-form responses or complex reasoning chains.
The MSI EdgeXpert - 13SUS is capable of running models with up to 200 billion parameters. This puts it in a rare class of hardware that can host "frontier-class" open-weight models locally.
The 128GB VRAM is particularly useful for Stable Diffusion XL or Flux.1 workflows, especially when running batch generations or training LoRAs. For video generation (Sora-like architectures or CogVideo), the VRAM capacity allows for longer temporal consistency without offloading to slower system RAM.
On this hardware, the best quality-to-speed tradeoff is typically found at 6-bit or 8-bit quantization. While 4-bit is faster, the 128GB VRAM is large enough that you don’t need to sacrifice perplexity for memory savings on 70B-class models.
For developers building agentic workflows (AutoGPT, CrewAI, LangGraph), the EdgeXpert - 13SUS acts as a local "brain." Agents often require multiple model calls or long-running processes; the 140W TDP and Vapor Chamber cooling ensure the system remains stable over hours of autonomous operation.
With dimensions of just 151 x 151 x 52 mm, this unit is designed for edge deployment. It can serve as a localized inference node for a small team, providing API access to internal models without the latency or privacy concerns of the cloud. The inclusion of PCIe Gen 5.0 and a 4TB Gen 5 SSD ensures that model weights load into VRAM almost instantaneously.
Researchers working with sensitive datasets (medical, legal, or proprietary IP) can utilize the 13SUS to run 200B parameter models entirely offline. The 20-core Arm architecture (10 Cortex-X925 + 10 Cortex-A725) provides a modern, efficient environment for managing data pre-processing and model orchestration.
Both systems utilize the GB10 architecture, but the EdgeXpert is the superior choice for sustained workloads. MSI’s thermal design allows for a 12% higher power draw when needed, translating to a 10% performance lead in tokens per second (TPS). If your workload involves continuous inference, the EdgeXpert's cooling prevents the performance "cliff" found in the Spark.
The Mac Studio offers higher peak memory bandwidth (up to 800 GB/s), which can result in faster raw TPS for some models. However, the EdgeXpert - 13SUS is built on NVIDIA CUDA, providing native support for the entire ecosystem of AI tools, kernels (FlashAttention, PagedAttention), and libraries that often arrive on Linux/Windows months before macOS. For engineers who need to mirror their production environment (usually NVIDIA-based), the EdgeXpert is the more practical developer tool.
A custom PC with dual RTX 3090/4090s provides 48GB of VRAM and higher raw TFLOPS but consumes 600W-850W and requires a massive tower. The EdgeXpert provides nearly 3x the VRAM (128GB) in a chassis that fits on a desk, making it the better choice for running the largest models (100B+ params) that simply won't fit on consumer GPU arrays without significant quantization loss.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 40.8 tok/s | 5.4 GB | |
| 8B | AA | 38.8 tok/s | 5.7 GB | ||
| 9B | AA | 36.5 tok/s | 6.0 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 45.9 tok/s | 4.8 GB | |
Gemma 4 E2B ITGoogle | 2B | AA | 59.3 tok/s | 3.7 GB | |
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | AA | 25.8 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | AA | 25.8 tok/s | 8.5 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 34.4 tok/s | 6.4 GB | |
Llama 2 13B ChatMeta | 13B | AA | 26.0 tok/s | 8.5 GB | |
Gemma 4 E4B ITGoogle | 4B | AA | 31.8 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | AA | 31.8 tok/s | 6.9 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | BB | 19.3 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | BB | 20.0 tok/s | 11.0 GB | |
Mistral Large 3 675BMistral AI | 675B(41B active) | BB | 3.3 tok/s | 66.3 GB | |
GLM-4.6Z.ai | 355B(32B active) | BB | 3.1 tok/s | 70.3 GB | |
DeepSeek-V3DeepSeek | 671B(37B active) | BB | 3.7 tok/s | 59.8 GB | |
DeepSeek-R1DeepSeek | 671B(37B active) | BB | 3.7 tok/s | 59.8 GB | |
DeepSeek-V3.1DeepSeek | 671B(37B active) | BB | 3.7 tok/s | 59.8 GB | |
DeepSeek-V3.2DeepSeek | 685B(37B active) | BB | 3.7 tok/s | 59.8 GB | |
Kimi K2 Instruct 0905Moonshot AI | 1000B(32B active) | BB | 2.6 tok/s | 84.6 GB | |
Kimi K2 ThinkingMoonshot AI | 1000B(32B active) | BB | 2.6 tok/s | 84.6 GB | |
Kimi K2.5Moonshot AI | 1000B(32B active) | BB | 2.6 tok/s | 84.6 GB | |
GLM-5Z.ai | 744B(40B active) | BB | 2.5 tok/s | 87.7 GB | |
GLM-5.1Z.ai | 744B(40B active) | BB | 2.5 tok/s | 87.7 GB | |
Kimi K2.6Moonshot AI | 1000B(32B active) | BB | 2.6 tok/s | 86.2 GB |