Reatan Gorgon Point creator variant with built-in speaker, dual-microphone array, and OCuLink in an all-metal chassis. 48GB DDR5-5600 + 2TB SSD for always-on AI sidekick workloads.
Good balance for indie developers running local copilots and chat. 30B+ models are reachable but only with aggressive quantization and short context.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker) is a compact edge AI workstation built around AMD’s latest Gorgon Point silicon. At $999, it targets the prosumer sweet spot—offering 55 TOPS of dedicated NPU performance, 16GB of unified VRAM, and a unique set of peripherals (built-in speaker, dual-mic array) that make it suitable for always-on, voice-interactive AI agents.
This is not a data center GPU server. It’s a self-contained mini PC designed for local inference workloads where power efficiency, form factor, and low latency matter more than raw throughput. The all-metal chassis houses a 12-core/24-thread CPU, Radeon 890M iGPU (RDNA 3.5), and XDNA 2 NPU delivering a combined 86 TOPS across the platform. For AI practitioners, the key differentiator is the OCuLink expansion port—allowing direct connection to an external GPU enclosure, bypassing USB/Thunderbolt bottlenecks.
Reatan positions this against the Minisforum AI X1 Pro and similar Ryzen AI 9 HX 470 mini PCs, but adds the integrated speaker and microphone array for voice-first interactions. This makes it a credible candidate for local AI sidekicks, voice-controlled agents, and edge deployments where a traditional desktop would be overkill.
The Reatan ships with 48GB DDR5-5600 in dual-channel configuration. This is critical: single-channel RAM (as seen in some competing HX 470 mini PCs) cuts memory bandwidth by roughly half, crippling iGPU-bound inference. The dual-channel setup here delivers 90 GB/s memory bandwidth—adequate for 13B parameter models at Q4-Q5 quantization.
VRAM: 16GB (shared system memory, allocated to the Radeon 890M iGPU)
This is your hard limit for model loading. The system can run:
| Metric | Value |
|---|---|
| INT8 NPU TOPS | 55 |
| Combined Platform AI | 86 TOPS |
| GPU Compute Units | 16 (RDNA 3.5) |
| TDP (configurable) | 28W base / 54W peak |
| Memory Bandwidth | 90 GB/s |
The Radeon 890M delivers roughly 8-9 TFLOPS FP16, comparable to an RTX 3050 mobile. For transformer inference, the bottleneck is memory bandwidth, not compute. Expect 30-50 tokens/second on 7B models at Q4, and 15-25 tokens/second on 13B models at Q4.
At 54W peak, this is one of the most power-efficient AI inference platforms available. For comparison, a typical RTX 4060 desktop GPU draws 115W. Running a local LLM server 24/7 on the Reatan costs roughly $0.15/day at average electricity rates. This makes it viable for always-on agentic workloads that would be wasteful on a full desktop.
| Model | Quantization | VRAM Usage | Expected Tokens/sec |
|---|---|---|---|
| Llama 3.1 8B | Q4_K_M | ~5.5 GB | 35-45 |
| Mistral 7B v0.3 | Q8_0 | ~7 GB | 25-35 |
| Qwen 2.5 7B | Q4_K_M | ~5 GB | 35-45 |
| DeepSeek-Coder-V2 Lite | Q4_K_M | ~12 GB | 15-20 |
| Llama 3.1 13B | Q4_K_M | ~9 GB | 18-25 |
| Qwen 2.5 14B | Q4_K_M | ~10 GB | 15-22 |
| Phi-3 Medium 14B | Q4_K_M | ~9.5 GB | 18-25 |
Sweet spot: 7B-8B models at Q4_K_M. This gives you 4K-8K context windows with acceptable speed for interactive use. For 13B models at Q4, expect slower but usable throughput—fine for batch processing or non-real-time tasks.
Multimodal models: LLaVA 1.6 7B runs comfortably. LLaVA 13B is possible at Q4 but leaves little headroom for image embeddings. Long-context tasks (32K+ tokens) will overflow the 16GB VRAM for 13B+ models.
OCuLink provides PCIe 4.0 x4 bandwidth (~4 GB/s)—significantly faster than Thunderbolt 4 (~3 GB/s). This enables:
Without eGPU, 32B models are not viable. The OCuLink port is the single most important expansion feature for AI workloads.
The built-in speaker and dual-microphone array make this the only mini PC in its class that works out of the box as a voice assistant. Deploy a local Whisper model for speech-to-text, run Llama 3.1 8B for reasoning, and use the speaker for TTS output. No external mic or speaker needed.
At 54W peak and a compact footprint, this fits into NUC-sized racks, kiosks, or automotive setups. The WiFi 7 and BT 5.4 support wireless connectivity, and quad 8K display output handles digital signage or monitoring dashboards.
Run a lightweight inference server (vLLM, llama.cpp, Ollama) accessible over the local network. The 48GB system memory allows hosting multiple model instances or serving multiple concurrent users for 7B-8B models. The 2TB NVMe SSD provides ample space for model storage and caching.
If you want to run local chatbots, code assistants, or RAG pipelines without cloud costs, this is a turnkey solution. The 16GB VRAM handles the most popular open-weight models, and the OCuLink port gives you an upgrade path to larger models without replacing the entire system.
This is not a training machine. 90 GB/s memory bandwidth and 16GB VRAM are insufficient for anything beyond fine-tuning 1B-3B parameter models with LoRA. For training, look at desktop GPUs with 24GB+ VRAM and higher memory bandwidth.
The Minisforum AI X1 Pro uses the same SoC but ships with 32GB single-channel RAM in its base configuration. The Reatan’s 48GB dual-channel setup provides 50% more memory bandwidth and 50% more system RAM. For AI inference, dual-channel is non-negotiable—single-channel cuts token throughput by 30-40% on memory-bound models.
Pick the Reatan if: You need dual-channel memory for inference performance, or you want the built-in speaker/mic for voice agents.
Pick the Minisforum if: You find it at a significantly lower price and plan to upgrade RAM yourself.
The Mac Mini M4 Pro offers higher memory bandwidth (~120 GB/s) and better single-core CPU performance, but costs $1,399+. The Reatan wins on: OCuLink eGPU expansion, Windows compatibility for CUDA-dependent tools (via eGPU), and the integrated audio peripherals.
Pick the Reatan if: You need Windows-native tooling, eGPU upgradability, or a voice-interactive form factor.
Pick the Mac Mini if: You prefer macOS, need higher memory bandwidth for larger models, or want the M4’s superior NPU for on-device ML workflows.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | BB | 13.5 tok/s | 5.4 GB | |
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | BB | 8.5 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | BB | 8.5 tok/s | 8.5 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | BB | 6.4 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | BB | 6.6 tok/s | 11.0 GB | |
Llama 2 13B ChatMeta | 13B | BB | 8.6 tok/s | 8.5 GB | |
| 8B | BB | 12.8 tok/s | 5.7 GB | ||
| 9B | BB | 12.0 tok/s | 6.0 GB | ||
| Ad | |||||
Llama 2 7B ChatMeta | 7B | BB | 15.1 tok/s | 4.8 GB | |
Gemma 4 E4B ITGoogle | 4B | BB | 10.5 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | BB | 10.5 tok/s | 6.9 GB | |
Mistral 7B InstructMistral AI | 7B | BB | 11.3 tok/s | 6.4 GB | |
Gemma 4 E2B ITGoogle | 2B | BB | 19.5 tok/s | 3.7 GB | |
| 8B | CC | 5.4 tok/s | 13.3 GB | ||
Qwen3.5-9BAlibaba Cloud (Qwen) | 9B | FF | 2.9 tok/s | 24.6 GB | |
Mistral Small 3 24BMistral AI | 24B | FF | 1.9 tok/s | 39.0 GB | |
| Ad | |||||
Qwen3.6-27BAlibaba Cloud | 27B | FF | 1.0 tok/s | 72.8 GB | |
Gemma 3 27B ITGoogle | 27B | FF | 1.7 tok/s | 43.8 GB | |
Qwen3.5-27BAlibaba Cloud (Qwen) | 27B | FF | 1.0 tok/s | 72.8 GB | |
Gemma 4 31B ITGoogle | 31B | FF | 0.9 tok/s | 82.0 GB | |
Qwen3-32BAlibaba Cloud (Qwen) | 32.8B | FF | 1.3 tok/s | 53.9 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | FF | 3.0 tok/s | 24.4 GB | |
LLaMA 65BMeta | 65B | FF | 1.8 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | FF | 1.7 tok/s | 43.4 GB | |
| Ad | |||||
| 70B | FF | 1.6 tok/s | 45.7 GB | ||