Flagship Strix Halo mini PC with 128GB unified LPDDR5X-8000 and Radeon 8060S. Unlocks the full 96GB GPU allocation — runs 70B at Q5 and Qwen3-235B sparse models locally. Won the American Good Design Platinum Award 2025.
Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB) is a flagship mini PC purpose-built for local AI inference. It leverages AMD’s Strix Halo APU — a 16-core Zen 5 CPU paired with a Radeon 8060S iGPU and XDNA 2 NPU — to deliver 96 GB of unified GPU-allocatable memory in a compact, low-power chassis. At $1,999, it occupies the prosumer tier: more capable than a consumer laptop, far more energy-efficient than a multi-GPU workstation, and directly competitive with entry-level server-grade inference rigs.
This is not a general-purpose desktop. The EVO-X2 is engineered for one task: running large language models and other AI workloads locally, without cloud dependency. Its 128 GB of LPDDR5X-8000 unified memory can allocate up to 96 GB to the GPU, enabling models that would otherwise require a $10,000+ setup. The design won the American Good Design Platinum Award 2025, reflecting its industrial and thermal engineering — a metal chassis with triple-fan cooling keeps sustained loads under 70 °C.
For developers and researchers who need to run 70B-parameter models at Q5 quantization, or experiment with sparse 235B models, the EVO-X2 is currently the most accessible hardware that can do so in a desk-friendly form factor.
The headline spec is the 96 GB of GPU-accessible memory. This is not a discrete GPU with dedicated VRAM; it’s unified memory shared between the CPU and GPU via the Radeon 8060S iGPU (40 RDNA 3.5 compute units). The full 96 GB allocation is unlocked by default on this SKU — no BIOS tweaks or memory reservation tricks required. This is the largest unified memory pool available in any mini PC as of mid-2025.
The 128 GB of LPDDR5X-8000 runs on a 256-bit bus, delivering 256 GB/s bandwidth. That’s roughly 60% of a desktop RTX 4090 (1 TB/s) but more than enough for token generation at reasonable batch sizes. For context, a Mac M2 Ultra with 192 GB unified memory peaks at 800 GB/s, but costs over $5,000. The EVO-X2’s bandwidth is a limiting factor for very large models (235B+), but for the 70B–120B range it’s well-matched to the compute throughput.
The XDNA 2 NPU provides 50 TOPS for integer operations, but the real workhorse for LLM inference is the iGPU’s FP16/INT4 matrix units. The Radeon 8060S iGPU delivers approximately 12.5 TFLOPS (FP16) — comparable to an RTX 4060 Ti. Combined with the CPU’s 16 Zen 5 cores (boost up to 5.1 GHz), the system achieves ~50 tokens/s on 7B models and ~23 tokens/s on 20B models in real-world benchmarks (e.g., GPT-OSS-20B at 23.8 tok/s out of the box on Ubuntu).
At 140W peak TDP, the EVO-X2 consumes less than a third of a typical gaming GPU. This makes it viable for edge deployment, home labs running 24/7, or offices where noise and heat matter. The triple-fan cooling keeps the APU under 70 °C during sustained inference, though brief throttling may occur after 15 minutes of peak load (per Mini PC Reviewer tests).
This is the critical question for any practitioner. The EVO-X2’s 96 GB unified memory and 256 GB/s bandwidth define a clear capability envelope.
| Model Size | Quantization | Approx. VRAM Usage | Feasibility |
|---|---|---|---|
| 7B | Q4_K_M | ~5 GB | Effortless, runs at >50 tok/s |
| 20B | Q5_K_M | ~14 GB | Fast (23–30 tok/s) |
| 70B | Q5_K_M | ~48 GB | Sweet spot — fits with headroom |
| 120B | Q4_K_M | ~72 GB | Possible with effort, ~15 tok/s |
| 235B | Sparse (50%) | ~96 GB | ~11 tok/s, requires sparse inference support |
The unified memory architecture excels at multimodal models (e.g., LLaVA, Qwen-VL) because image embeddings share the same pool without PCIe transfers. Long-context tasks (128K+ tokens) are feasible for models up to 70B; beyond that, context length must be reduced to avoid OOM.
Based on benchmarks from nishtahir.com and Mini PC Reviewer:
For production inference servers, batch size 1 is typical; for hobbyist use, these speeds are comfortable for interactive chat on models up to 70B.
The GMKtec EVO-X2 is currently the most cost-effective way to get 96 GB of unified memory for AI inference in a compact, low-power package. It fills a gap between consumer laptops and enterprise workstations, making large local models accessible to individual developers and small teams.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | AA | 38.3 tok/s | 5.4 GB | |
| 8B | AA | 36.4 tok/s | 5.7 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 43.0 tok/s | 4.8 GB | |
| 9B | AA | 34.3 tok/s | 6.0 GB | ||
Gemma 4 E2B ITGoogle | 2B | AA | 55.6 tok/s | 3.7 GB | |
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 32.2 tok/s | 6.4 GB | |
| Ad | |||||
Llama 2 13B ChatMeta | 13B | AA | 24.3 tok/s | 8.5 GB | |
Gemma 4 E4B ITGoogle | 4B | BB | 29.8 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | BB | 29.8 tok/s | 6.9 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | BB | 18.1 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | BB | 18.7 tok/s | 11.0 GB | |
GLM-4.5Z.ai | 355B(32B active) | BB | 4.0 tok/s | 51.8 GB | |
Kimi K2 InstructMoonshot AI | 1000B(32B active) | BB | 4.0 tok/s | 51.8 GB | |
| 70B | BB | 4.5 tok/s | 45.7 GB | ||
| Ad | |||||
GLM-4.7Z.ai | 358B(32B active) | BB | 3.9 tok/s | 52.6 GB | |
Qwen3.5-397B-A17BAlibaba Cloud (Qwen) | 397B(17B active) | BB | 4.5 tok/s | 46.0 GB | |
Qwen 3.5 OmniAlibaba Cloud | 397B(17B active) | BB | 4.6 tok/s | 45.2 GB | |
Llama 2 70B ChatMeta | 70B | BB | 4.7 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | BB | 4.7 tok/s | 43.6 GB | |
DeepSeek-V3DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
DeepSeek-R1DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
DeepSeek-V3.1DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
| Ad | |||||
DeepSeek-V3.2DeepSeek | 685B(37B active) | BB | 3.4 tok/s | 59.8 GB | |