Tiny 12.8cm Gorgon Point mini PC with 86 platform TOPS, 48GB DDR5-5600, 2TB PCIe 4.0 SSD, and OCuLink for eGPU expansion. Near-silent (under 36 dB) cooling.
Good balance for indie developers running local copilots and chat. 30B+ models are reachable but only with aggressive quantization and short context.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Reatan X8 (Ryzen AI 9 HX 470 48GB) is a purpose-built mini PC for local AI inference, edge deployment, and on-device agentic workloads. At $999 MSRP, it sits in the prosumer tier—priced like a mid-range laptop but packing 86 platform TOPS, 48 GB of unified memory, and a Radeon 890M iGPU capable of running 13B parameter models at Q4-Q5 quantization entirely on-chip. Manufactured by Reatan, this tiny 12.8 cm chassis competes directly with other high-TOPS mini PCs like the Beelink SER10 Max and entry-level NVIDIA Jetson modules, but offers a critical advantage: OCuLink expansion for an external GPU when you need to scale beyond integrated graphics.
For practitioners, the X8 matters because it eliminates the trade-off between portability and AI performance. You can run local LLMs, RAG pipelines, and multimodal agents at the edge without a noisy desktop tower or a cloud subscription. The near-silent cooling (under 36 dB) means it’s suitable for 24/7 inference servers in an office or lab environment.
The X8 ships with 48 GB of DDR5-5600 in a single SO-DIMM module, with a second slot available for expansion up to 128 GB. For AI workloads, this unified memory pool serves as both system RAM and VRAM for the integrated Radeon 890M. While the iGPU can address up to 16 GB as dedicated VRAM (configurable in BIOS), the full 48 GB is available for model loading via CPU-side inference or shared memory.
| Metric | Value |
|---|---|
| CPU | AMD Ryzen AI 9 HX 470 (12C/24T, 4 Zen 5 + 8 Zen 5c, up to 5.2 GHz) |
| iGPU | AMD Radeon 890M – 16 CUs @ 3.1 GHz (RDNA 3.5) |
| NPU | XDNA 2 – 55 TOPS (INT8) |
| Platform AI TOPS | 86 TOPS (combined CPU + GPU + NPU) |
| INT8 (GPU) | ~55 TOPS |
| TDP | 28W base / 54W configurable |
The Radeon 890M delivers roughly 8.9 TFLOPS (FP16) and 17.8 TFLOPS (INT8) for matrix operations—sufficient for real-time text generation with 7B–13B models. The XDNA 2 NPU adds 55 TOPS specifically for low-power, always-on inference tasks like keyword spotting or lightweight classification, but most practitioners will offload heavy LLM inference to the GPU.
At 54W TDP, the X8 achieves an efficiency of ~1.02 TOPS per watt (platform-level). This makes it one of the most energy-efficient options for running local LLMs at the edge. Compare with a desktop RTX 4090 (450W, ~330 TOPS INT8) which yields ~0.73 TOPS/W—the X8 is ~40% more efficient per watt. For always-on or battery-backed edge deployments, that difference matters.
The X8’s sweet spot is 7B–13B parameter models at Q4_K_M or Q5_K_M quantization. With 16 GB VRAM accessible to the GPU, you can comfortably load:
For multimodal models, the 16 GB VRAM is tight but workable for LLaVA-1.6 7B (Q4_K_M) or Qwen-VL 7B (Q4_K_M). Expect ~20–30 tokens/sec with image inputs.
Connect an external GPU enclosure (e.g., with an RTX 4060 or RTX 4070) over OCuLink, and the X8 becomes capable of running 32B parameter models at Q3_K_M or 13B models at FP16. The OCuLink interface provides ~32 GB/s bandwidth (PCIe 4.0 x4), which is sufficient for inference workloads and avoids the latency overhead of Thunderbolt.
With 48 GB system RAM, you can run 7B–13B models at Q8_0 via llama.cpp on CPU at ~8–12 tokens/sec, or 32B models at Q4_K_M at ~3–5 tokens/sec. This is useful for batch processing or when the GPU is occupied.
This is an inference-first device. The Radeon 890M lacks the VRAM and tensor core density for training anything larger than a 1B parameter model from scratch. Fine-tuning with LoRA on 7B models is possible (using ~12 GB VRAM), but expect slow iteration times (2–3 hours per epoch). For training, pair the X8 with an eGPU.
The SER10 Max is the closest competitor, sharing the same CPU and iGPU. Key differences:
Pick the X8 if you want a lower starting price, quieter operation, and the ability to upgrade to 128 GB later. Pick the SER10 Max if you need 64 GB out of the box.
The M4 Pro delivers competitive AI inference (especially with MLX), but:
Pick the X8 if you need more VRAM, eGPU expandability, or prefer x86 software compatibility. Pick the Mac Mini if you’re already in the Apple ecosystem and value macOS-specific optimizations.
Qwen3-30B-A3BAlibaba | 30B(3B active) | BB | 13.5 tok/s | 5.4 GB | |
Qwen3.6 35B-A3BAlibaba | 35B(3B active) | BB | 8.5 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba | 35B(3B active) | BB | 8.5 tok/s | 8.5 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | BB | 6.4 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | BB | 6.6 tok/s | 11.0 GB | |
Llama 2 13B ChatMeta | 13B | BB | 8.6 tok/s | 8.5 GB | |
| 8B | BB | 12.8 tok/s | 5.7 GB | ||
| 9B | BB | 12.0 tok/s | 6.0 GB | ||
| Ad | |||||
Llama 2 7B ChatMeta | 7B | BB | 15.1 tok/s | 4.8 GB | |
Gemma 4 E4B ITGoogle | 4B | BB | 10.5 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | BB | 10.5 tok/s | 6.9 GB | |
Mistral 7B InstructMistral AI | 7B | BB | 11.3 tok/s | 6.4 GB | |
Gemma 4 E2B ITGoogle | 2B | BB | 19.5 tok/s | 3.7 GB | |
| 8B | CC | 5.4 tok/s | 13.3 GB | ||
Qwen3.5-9BAlibaba | 9B | FF | 2.9 tok/s | 24.6 GB | |
Mistral Small 3 24BMistral AI | 24B | FF | 1.9 tok/s | 39.0 GB | |
| Ad | |||||
Qwen3.6-27BAlibaba | 27B | FF | 1.0 tok/s | 72.8 GB | |
Gemma 3 27B ITGoogle | 27B | FF | 1.7 tok/s | 43.8 GB | |
Qwen3.5-27BAlibaba | 27B | FF | 1.0 tok/s | 72.8 GB | |
Gemma 4 31B ITGoogle | 31B | FF | 0.9 tok/s | 82.0 GB | |
Qwen3-32BAlibaba | 32.8B | FF | 1.3 tok/s | 53.9 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | FF | 3.0 tok/s | 24.4 GB | |
LLaMA 65BMeta | 65B | FF | 1.8 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | FF | 1.7 tok/s | 43.4 GB | |
| Ad | |||||
| 70B | FF | 1.6 tok/s | 45.7 GB | ||