Ultra-compact 9.7-inch Strix Halo desktop with 128GB LPDDR5X-8000 8-channel memory and Radeon 8060S iGPU. Triple-fan, 5 heat-pipe cooling sustains 120W for local 70B model inference.
Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The NIMO Mini PC (Ryzen AI Max+ 395 128GB) is a compact desktop built around AMD’s Strix Halo APU—the Ryzen AI Max+ 395. It combines a 16-core Zen 5 CPU, a Radeon 8060S iGPU with 40 RDNA 3.5 compute units, and a dedicated XDNA 2 NPU rated at 50 TOPS. The headline feature is 128 GB of unified LPDDR5X-8000 memory, configurable as up to 96 GB of VRAM for GPU workloads. At $2,299, this is a prosumer-grade machine that targets a specific niche: local inference of large language models (70B–120B parameters) on a desktop that fits in a 9.7 × 7.4 × 3.8-inch chassis.
It competes with systems like the Framework Desktop (with AMD Strix Halo) and high-end Mini PCs using discrete GPUs (e.g., Intel NUC 13 Extreme with RTX 4060), but the NIMO’s advantage is unified memory that sidesteps the PCIe bottleneck between CPU and GPU. For AI engineers and hobbyists who need to run quantized 70B models at usable speeds without a tower-sized GPU rig, this is a clean, efficient alternative.
What matters for AI inference on this machine:
Compared to alternatives: A Mac Mini M4 Pro with 48 GB unified memory costs similar but maxes at 48 GB VRAM and has lower memory bandwidth (273 GB/s on M4 Pro vs. 256 GB/s here). The NIMO offers double the VRAM headroom for model sizes. Against a NUC 13 Extreme with RTX 4060 (8 GB VRAM), it’s no contest—the NIMO can run models the NUC can’t touch.
This machine’s primary strength is running local LLMs that require 32–96 GB of VRAM. Here’s the breakdown:
Sweet spot: 70B models at Q5_K_M_M (medium quantization). You get near-original quality with 8–10 tokens/second—adequate for interactive use. For agents or batch generation, tuning batch size to 2 (if context fits) can increase throughput.
VRAM allocation: Use AMD’s Adrenalin software or BIOS to reserve 96 GB for the iGPU. The remaining 32 GB is for system tasks. That’s the recommended split for heavy inference.
This Mini PC is not for everyone. It’s built for:
vs. Apple Mac Mini M4 Pro (48 GB, $2,199)
vs. Framework Desktop (configured with same Strix Halo, ~$2,500)
vs. Intel NUC 13 Extreme with RTX 4060 (8 GB VRAM, ~$1,800)
Bottom line: If your AI work requires running 70B–120B parameter models locally, this is one of the most cost-effective and space-efficient options available at $2,299. If you can compromise on model size, other hardware may offer faster token speeds for smaller models.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | AA | 38.3 tok/s | 5.4 GB | |
| 8B | AA | 36.4 tok/s | 5.7 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 43.0 tok/s | 4.8 GB | |
| 9B | AA | 34.3 tok/s | 6.0 GB | ||
Gemma 4 E2B ITGoogle | 2B | AA | 55.6 tok/s | 3.7 GB | |
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 32.2 tok/s | 6.4 GB | |
| Ad | |||||
Llama 2 13B ChatMeta | 13B | AA | 24.3 tok/s | 8.5 GB | |
Gemma 4 E4B ITGoogle | 4B | BB | 29.8 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | BB | 29.8 tok/s | 6.9 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | BB | 18.1 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | BB | 18.7 tok/s | 11.0 GB | |
GLM-4.5Z.ai | 355B(32B active) | BB | 4.0 tok/s | 51.8 GB | |
Kimi K2 InstructMoonshot AI | 1000B(32B active) | BB | 4.0 tok/s | 51.8 GB | |
| 70B | BB | 4.5 tok/s | 45.7 GB | ||
| Ad | |||||
GLM-4.7Z.ai | 358B(32B active) | BB | 3.9 tok/s | 52.6 GB | |
Qwen3.5-397B-A17BAlibaba Cloud (Qwen) | 397B(17B active) | BB | 4.5 tok/s | 46.0 GB | |
Qwen 3.5 OmniAlibaba Cloud | 397B(17B active) | BB | 4.6 tok/s | 45.2 GB | |
Llama 2 70B ChatMeta | 70B | BB | 4.7 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | BB | 4.7 tok/s | 43.6 GB | |
DeepSeek-V3DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
DeepSeek-R1DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
DeepSeek-V3.1DeepSeek | 671B(37B active) | BB | 3.4 tok/s | 59.8 GB | |
| Ad | |||||
DeepSeek-V3.2DeepSeek | 685B(37B active) | BB | 3.4 tok/s | 59.8 GB | |