No image

MINISFORUM

MINISFORUM AI X1 Pro-370

Name: MINISFORUM AI X1 Pro-370
Brand: MINISFORUM
Price: 1099 USD
Availability: InStock

Copilot+ AI PC mini desktop with AMD Ryzen AI 9 HX 370 (Strix Point), Radeon 890M iGPU, and a 50 TOPS XDNA 2 NPU. 32GB DDR5, 1TB SSD, OCuLink for eGPU expansion. Always-on local AI sidekick.

AI PCs & LaptopsIn Stock

Edge AIMobile / On-DeviceEnergy Efficient

Buy on Amazon$1,099Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Our Take

Best for: Sweet spot for 13B–20B dense models at Q4

Good balance for indie developers running local copilots and chat. 30B+ models are reachable but only with aggressive quantization and short context.

Pair this with

Specifications

Overview

The MINISFORUM AI X1 Pro-370 is a Copilot+ certified mini desktop that delivers 50 TOPS from its AMD XDNA 2 NPU, backed by a Ryzen AI 9 HX 370 processor and Radeon 890M iGPU. At $1,099 (32GB/1TB), it occupies the prosumer sweet spot—more capable than a typical office mini PC, but at a fraction of the cost and power draw of a workstation-class machine. For AI engineers and local-inference practitioners, the X1 Pro-370 offers a balance of on-device AI acceleration, expandability via OCuLink, and a form factor that fits on a desk or in a rack.

MINISFORUM positions this as the value leader in its X1 Pro lineup, trading the marginal NPU boost of the 470 model (86 vs 80 combined TOPS) for a lower price without sacrificing core AI capability. It competes directly with other Strix Point mini PCs like the ASUS NUC 14 Pro+ and the Geekom A8, but stands out by including an OCuLink port for external GPU expansion—critical for AI workloads that exceed the iGPU’s shared memory budget.

---

AI Performance & Specifications

Compute and Memory

The HX 370’s 12-core CPU (4 Zen 5 + 8 Zen 5c) provides ample headroom for pre-processing and orchestration, but the AI story lives in the iGPU and NPU. The Radeon 890M (RDNA 3.5, 16 CUs) delivers roughly 30 TOPS of FP16/INT8 compute, and the XDNA 2 NPU adds another 50 dedicated TOPS for sustained inference. Combined platform AI throughput is 80 TOPS, enough to run Copilot+ features and small-to-medium LLMs entirely on-device.

The critical constraint is memory. The system ships with 32GB DDR5-5600 in dual-channel, delivering ~90 GB/s bandwidth. That bandwidth is shared between CPU and iGPU—there is no dedicated VRAM. For inference, the iGPU can address up to 16GB (configurable in BIOS), but that allocation comes from system memory. This limits model size and token generation speed relative to a dedicated GPU with high-bandwidth memory.

Metric	Value	Impact on AI
VRAM (iGPU allocatable)	16 GB	Fits 13B at Q4, 8B at Q4 with room for context
Memory bandwidth	90 GB/s	~20-30 tok/s for 7B Llama 3.1 at Q4; ~10-15 tok/s for 13B
INT8 TOPS (NPU)	50	Efficient for small models (<3B), vision transformers
TDP	28W base / 54W configurable	Excellent energy efficiency for always-on agents
Expansion	OCuLink	Enables 32B at Q3 via eGPU (e.g., RTX 4060 or A-series)

Power Efficiency

At 54W peak (135W adapter), the X1 Pro-370 sips power compared to a desktop GPU solution. For always-on local agents or edge deployment scenarios, this is a key advantage—noise is low, heat is minimal, and electricity costs are negligible. The integrated power supply (brick adapter) and compact chassis make it suitable for co-location or remote sites.

---

What Models Can It Run?

The X1 Pro-370 is a capable local inference machine for models up to about 13B parameters, with a clear trade-off between quality and speed. Here’s what you can expect for popular open-weight LLMs.

Native (iGPU only, 16 GB VRAM)

Llama 3.1 8B (Q4_K_M): ~20-30 tokens/second. Comfortably fits in 6GB, leaving headroom for 8K-16K context.
Mistral 7B (Q4_K_M): ~25-35 tok/s. Excellent for real-time chat and code generation.
Qwen 2.5 7B (Q4_K_M): ~20-30 tok/s. Multilingual and long-context variants fit with moderate context (<8K).
DeepSeek-R1 7B (Q4_K_M): ~20-28 tok/s. Reasoning-heavy tasks work well.
Llama 3.1 13B (Q4_K_M): ~10-15 tok/s. Fits with ~4GB leftover for 4K context. Usable but not interactive for long generations.
DeepSeek-V2 Lite 16B (Q3_K_M): ~8-12 tok/s. Tight fit—requires careful context management. Better suited via OCuLink.

With eGPU via OCuLink

Plug in an external GPU (e.g., RTX 4060 with 12GB VRAM, or an RTX 3090 with 24GB). This unlocks:

Llama 3.1 32B (Q3_K_M): ~8-12 tok/s on a 12GB eGPU. The PCIe 4.0 x4 OCuLink connection adds ~1-2ms latency per token, acceptable for batch inference.
Mixtral 8x7B (Q4_K_M): ~10-15 tok/s on a 12GB eGPU.
Llama 3.1 70B (Q4_K_M): requires 24GB+ eGPU, achievable with a used RTX 3090.

Multimodal & Vision Models

The Radeon 890M can run LLaVA-NeXT 7B (Q4) at ~15 tok/s and smaller vision transformers (e.g., CLIP, SigLIP) efficiently via the NPU. Stable Diffusion XL 1.0 base runs at ~1-2 it/s on the iGPU—usable for single-image generation but slow for batch work. OCuLink dramatically improves this.

Sweet Spot Recommendation

For daily driver local LLM inference, target 7B-8B models at Q4_K_M. They fit comfortably, deliver interactive speeds, and the 32GB system RAM allows multiple model swaps or large context windows (up to 32K tokens). If you need higher quality from a 13B or MoE model, use the OCuLink eGPU path—or accept slower single-stream output.

---

Use Cases & Target Audience

AI Developers & Hobbyists

The X1 Pro-370 is ideal for developers prototyping agentic workflows, local RAG pipelines, or testing model quantization. The NPU offloads small inference tasks (classification, embedding, function calling) while the CPU/GPU handles heavier loads. The upgradeable SO-DIMM RAM (up to 128GB) and three M.2 slots mean you can expand storage and memory without replacing the unit.

Edge & On-Device Deployments

At 54W, this machine is a solid candidate for edge inference boxes, kiosks, or always-on local AI assistants. Dual 2.5GbE and WiFi 7 provide robust networking for multi-node setups. The OCuLink port offers a future path to higher compute without replacing the base unit.

Teams Running Inference Servers

If you need a low-power, quiet server for serving small models to a few concurrent users, the X1 Pro-370 works well with llama.cpp, Ollama, or vLLM. Within a homelab or small office, it can handle 2-3 concurrent 7B Q4 streams at 15-20 tok/s each.

Not for Training

The shared memory and modest iGPU make fine-tuning impractical beyond lightweight LoRA adapters with batch size 1. For training, look to a dedicated GPU workstation or cloud instance.

---

How It Compares

Versus ASUS NUC 14 Pro+ (Core Ultra 9 285H)

The NUC 14 Pro+ offers 34 TOPS NPU (Intel AI Boost) and up to 96GB RAM, but no OCuLink. Its iGPU (Arc Xe-LPG) lags behind the Radeon 890M for LLM inference. The X1 Pro-370 wins on raw NPU TOPS and the expansion path for eGPU. Choose the NUC if you need more CPU memory bandwidth (LP-DDR5x-7467) for CPU-based inference.

Versus Geekom A8 (Ryzen 9 8945HS)

The Geekom A8 uses the previous-gen Zen 4 architecture with a 16 TOPS NPU and Radeon 780M iGPU. While cheaper (~$799), it lacks both the dedicated NPU throughput and OCuLink for future upgrades. For AI workloads, the X1 Pro-370 is the clear step-up—more than double the NPU performance and real expandability.

When to pick the MINISFORUM AI X1 Pro-370: You need local AI inference today, want the flexibility of eGPU expansion without committing to a full desktop build, and value energy efficiency and a compact footprint. For under $1,100, it delivers the best combination of on-device AI and upgrade path in a mini desktop form factor.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	BB	13.5 tok/s	5.4 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	BB	8.5 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	BB	8.5 tok/s	8.5 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	6.4 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	6.6 tok/s	11.0 GB
Llama 2 13B ChatMeta	13B	BB	8.6 tok/s	8.5 GB
Llama 3 8B InstructMeta	8B	BB	12.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	BB	12.0 tok/s	6.0 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 7B ChatMeta	7B	BB	15.1 tok/s	4.8 GB
Gemma 4 E4B ITGoogle	4B	BB	10.5 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	BB	10.5 tok/s	6.9 GB
Mistral 7B InstructMistral AI	7B	BB	11.3 tok/s	6.4 GB
Gemma 4 E2B ITGoogle	2B	BB	19.5 tok/s	3.7 GB
Llama 3.1 8B InstructMeta	8B	CC	5.4 tok/s	13.3 GB
Qwen3.5-9BAlibaba Cloud (Qwen)	9B	FF	2.9 tok/s	24.6 GB
Mistral Small 3 24BMistral AI	24B	FF	1.9 tok/s	39.0 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Qwen3.6-27BAlibaba Cloud	27B	FF	1.0 tok/s	72.8 GB
Gemma 3 27B ITGoogle	27B	FF	1.7 tok/s	43.8 GB
Qwen3.5-27BAlibaba Cloud (Qwen)	27B	FF	1.0 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	0.9 tok/s	82.0 GB
Qwen3-32BAlibaba Cloud (Qwen)	32.8B	FF	1.3 tok/s	53.9 GB
Falcon 40B InstructTechnology Innovation Institute	40B	FF	3.0 tok/s	24.4 GB
LLaMA 65BMeta	65B	FF	1.8 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	FF	1.7 tok/s	43.4 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	FF	1.6 tok/s	45.7 GB

Rows per page

Page 1 of 3

Quick Specs

VRAM16 GB

INT850 TOPS

TDP54 W

Memory BW90 GB/s

Max Params13B at Q4; 32B at Q3 with eGPU via OCuLink

Form FactorMini Desktop

CPUAMD Ryzen AI 9 HX 370 (12C/24T, 4 Zen 5 + 8 Zen 5c, up to 5.1 GHz)

GPUAMD Radeon 890M iGPU (RDNA 3.5)

NPUXDNA 2 — 50 TOPS

Combined Platform AI80 TOPS

Microsoft Copilot+Certified

Memory32GB DDR5-5600 (dual-channel, ~90 GB/s, shared with iGPU)

Storage1TB PCIe 4.0 NVMe SSD (3x M.2 slots total)

ExpansionOCuLink for external GPU enclosure

Connectivity2x USB4, dual 2.5GbE, WiFi 7, BT 5.4, 8K quad display

Power Supply135W external adapter

TDP28W base / 54W configurable

OSWindows 11 Pro