Compact Intel-only mini PC with i9-13900HK and discrete Intel ARC A770 16GB. XMX-engine accelerated inference at 512 GB/s, 32GB DDR5, 1TB SSD, six-display 8K output for multi-monitor AI dev work.
Good balance for indie developers running local copilots and chat. 30B+ models are reachable but only with aggressive quantization and short context.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The ACEMAGIC M1A Pro is a compact mini PC that pairs an Intel Core i9-13900HK processor with a discrete Intel Arc A770 GPU (16 GB GDDR6) in a chassis barely larger than a console. At $799 MSRP, it targets a niche between consumer-grade mini PCs and full-tower workstations — a prosumer edge device for local AI inference, multi-monitor development, and agentic workflows that demand GPU acceleration without the footprint or power draw of a desktop rig.
What sets this machine apart is its use of an MXM-format Intel Arc A770 — a desktop-class GPU in a mini PC. That gives you 16 GB of VRAM with 512 GB/s bandwidth and Intel’s XMX matrix engines for AI acceleration. Combined with 32 GB of DDR5 system RAM and a 14-core i9 CPU, the M1A Pro can run 13B-parameter models entirely on GPU at reasonable quantization, and offload larger models (up to 32B) using system RAM. It competes directly with laptops like the ASUS ROG Flow Z13 (Arc graphics) or lower-end RTX 4060 laptops, but offers better sustained thermal performance due to its larger cooling solution. For edge deployment, it’s a viable alternative to an NVIDIA Jetson Orin or a Mac Mini with M4 Pro — depending on your software stack and model requirements.
The specs that matter for inference are VRAM capacity, memory bandwidth, and compute throughput. Here’s how the M1A Pro breaks down:
| Spec | Value | Why It Matters |
|---|---|---|
| VRAM | 16 GB GDDR6 | Fits Llama 3.1 8B at Q4_K_M (~6 GB), Mistral 7B at Q8 (~7 GB), Qwen 2.5 14B at Q4 (~8 GB). 13B at Q5_K_M (~10 GB) fits entirely. |
| Memory Bandwidth | 512 GB/s | Determines token generation speed. Expect ~20–30 tok/s for 7B models, ~10–15 tok/s for 13B at Q4. |
| XMX Engines | 4096 ALUs @ 2.1 GHz (approx 17 TFLOPS FP16) | Intel’s matrix engines accelerate transformer inference via IPEX-LLM or OpenVINO. Real-world throughput competitive with RTX 3060/4060 for LLM inference. |
| TDP | 300 W (system) | The GPU alone draws up to 225W. The system runs warm under load — expect fan noise at sustained inference. |
| CPU | i9-13900HK (14C/20T, 5.4 GHz) | Handles prompt processing, tokenization, and offloaded layers. 54W cTDP keeps thermals manageable. |
512 GB/s is the critical number. For context, an RTX 4060 laptop GPU has ~256 GB/s; an RTX 4090 desktop has ~1 TB/s. The M1A Pro sits in the middle — good enough for real-time chat at 7B–13B, but not for high-throughput serving of large models. With llama.cpp and Intel’s Vulkan backend, expect:
These are plausible for a 512 GB/s GPU with XMX support. Performance will vary based on prompt length, batch size, and software stack. Using Intel’s IPEX-LLM (which optimizes for XMX) can yield 10–20% higher throughput than generic Vulkan.
At 300W peak, the M1A Pro is not a low-power edge device. But for a mini PC with a discrete GPU, it’s reasonable. Idle power is around 40W. If you’re deploying at the edge with limited cooling or battery, consider a lower-TDP alternative. For a desk setup, the trade-off is acceptable for the inference performance.
These models fit entirely on the GPU at the specified quantization, avoiding system-RAM offload and maximizing speed:
With 32 GB DDR5, you can run larger models by offloading some layers to system RAM. Expect slower generation (5–10 tok/s) but still usable for interactive tasks:
The M1A Pro can run vision-language models (e.g., LLaVA 1.6 7B, Qwen-VL 7B) entirely on GPU. For long-context tasks (32K+ tokens), the 32 GB system RAM helps with KV cache offload, but generation speed will drop. The six-display output (USB4 8K + 2x DP 2.0 8K + 2x HDMI 2.0) makes it ideal for multi-monitor AI development — run your model on one screen, monitor logs on another, and output on a third.
For most practitioners, the best quality-to-speed tradeoff is Llama 3.1 8B at Q4_K_M or Qwen 2.5 14B at Q4_K_M. Both fit in VRAM, produce high-quality outputs, and deliver interactive speeds (20+ tok/s). If you need 13B parameter reasoning, use Q5_K_M — still fits and gives better coherence than Q4.
llama.cpp, Ollama, and LM Studio out of the box. 16 GB VRAM is enough for most 7B–13B models at decent quantization.vLLM or llama.cpp server mode. Not for production-scale serving, but fine for dev/test.The ROG Flow Z13 is a 2-in-1 laptop with a similar CPU and an RTX 4060 (8 GB VRAM, 256 GB/s). The M1A Pro offers double the VRAM and double the memory bandwidth — a clear advantage for LLM inference. The Flow Z13 is portable; the M1A Pro stays on your desk. If you need mobility, pick the Flow. If you need to run 13B models or larger quantizations, the M1A Pro wins.
The Mac Mini M4 Pro offers 24 GB unified memory (shared CPU/GPU) with ~400 GB/s bandwidth. For models that fit in 24 GB (e.g., Llama 3.1 8B at Q8, Mixtral 8x7B at Q3), Apple’s Metal backend can match or exceed Intel Arc performance. The M1A Pro is cheaper ($799 vs ~$1,400 for M4 Pro) and runs Windows-native tools like llama.cpp with CUDA-like performance via Vulkan. The Mac Mini is quieter and more power-efficient. Choose the M1A Pro if you need Windows, prefer Intel tooling, or want to run models that require >16 GB but <32 GB via offload.
Qwen3.6 35B-A3BAlibaba | 35B(3B active) | SS | 48.3 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba | 35B(3B active) | SS | 48.3 tok/s | 8.5 GB | |
Llama 2 13B ChatMeta | 13B | SS | 48.7 tok/s | 8.5 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | SS | 36.3 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | SS | 37.4 tok/s | 11.0 GB | |
Qwen3-30B-A3BAlibaba | 30B(3B active) | SS | 76.5 tok/s | 5.4 GB | |
| 9B | SS | 68.5 tok/s | 6.0 GB | ||
| 8B | SS | 72.8 tok/s | 5.7 GB | ||
| Ad | |||||
Gemma 4 E4B ITGoogle | 4B | SS | 59.6 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | SS | 59.6 tok/s | 6.9 GB | |
Mistral 7B InstructMistral AI | 7B | SS | 64.5 tok/s | 6.4 GB | |
Llama 2 7B ChatMeta | 7B | AA | 86.1 tok/s | 4.8 GB | |
Gemma 4 E2B ITGoogle | 2B | AA | 111.2 tok/s | 3.7 GB | |
| 8B | AA | 30.9 tok/s | 13.3 GB | ||
Qwen3.5-9BAlibaba | 9B | FF | 16.8 tok/s | 24.6 GB | |
Mistral Small 3 24BMistral AI | 24B | FF | 10.6 tok/s | 39.0 GB | |
| Ad | |||||
Qwen3.6-27BAlibaba | 27B | FF | 5.7 tok/s | 72.8 GB | |
Gemma 3 27B ITGoogle | 27B | FF | 9.4 tok/s | 43.8 GB | |
Qwen3.5-27BAlibaba | 27B | FF | 5.7 tok/s | 72.8 GB | |
Gemma 4 31B ITGoogle | 31B | FF | 5.0 tok/s | 82.0 GB | |
Qwen3-32BAlibaba | 32.8B | FF | 7.6 tok/s | 53.9 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | FF | 16.9 tok/s | 24.4 GB | |
LLaMA 65BMeta | 65B | FF | 10.5 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | FF | 9.5 tok/s | 43.4 GB | |
| Ad | |||||
| 70B | FF | 9.0 tok/s | 45.7 GB | ||