No image

GMKtec

GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB)

Name: GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB)
Brand: GMKtec
Price: 1999 USD
Availability: InStock

Flagship Strix Halo mini PC with 128GB unified LPDDR5X-8000 and Radeon 8060S. Unlocks the full 96GB GPU allocation — runs 70B at Q5 and Qwen3-235B sparse models locally. Won the American Good Design Platinum Award 2025.

AI PCs & LaptopsIn Stock

Best for LLMsEdge AIPremium / High-EndEnergy Efficient

Buy on Amazon$1,999Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM96 GB

INT850 TOPS

TDP140 W

Memory BW256 GB/s

Max Params70B at Q5_K_M; 120B at Q4 with effort; 235B sparse at ~11 tok/s

Form FactorMini PC

APUAMD Ryzen AI Max+ 395 (16C/32T Zen 5, Radeon 8060S 40 RDNA 3.5 CUs, XDNA 2 NPU 50 TOPS)

Boost ClockUp to 5.1 GHz

iGPU VRAM AllocationUp to 96GB unified

Memory128GB LPDDR5X-8000 unified (256-bit, 256 GB/s, shared with GPU)

Storage2TB PCIe 4.0 NVMe SSD (Crucial P3)

Display OutputQuad 8K

ConnectivityUSB4, WiFi 7, BT 5.4, SD Card 4.0

TDP120W sustained / 140W peak

AwardAmerican Good Design Platinum 2025

OSWindows 11

Our Take

Best for: Datacenter inference for flagship dense models

Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.

Pair this withKimi K2 Instruct (1000B)Largest popular open model that fits at Q4 — needs roughly 51.8 GB on this 96 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

The GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB) is a flagship mini PC purpose-built for local AI inference. It leverages AMD’s Strix Halo APU — a 16-core Zen 5 CPU paired with a Radeon 8060S iGPU and XDNA 2 NPU — to deliver 96 GB of unified GPU-allocatable memory in a compact, low-power chassis. At $1,999, it occupies the prosumer tier: more capable than a consumer laptop, far more energy-efficient than a multi-GPU workstation, and directly competitive with entry-level server-grade inference rigs.

This is not a general-purpose desktop. The EVO-X2 is engineered for one task: running large language models and other AI workloads locally, without cloud dependency. Its 128 GB of LPDDR5X-8000 unified memory can allocate up to 96 GB to the GPU, enabling models that would otherwise require a $10,000+ setup. The design won the American Good Design Platinum Award 2025, reflecting its industrial and thermal engineering — a metal chassis with triple-fan cooling keeps sustained loads under 70 °C.

For developers and researchers who need to run 70B-parameter models at Q5 quantization, or experiment with sparse 235B models, the EVO-X2 is currently the most accessible hardware that can do so in a desk-friendly form factor.

AI Performance & Specifications

VRAM: 96 GB Unified

The headline spec is the 96 GB of GPU-accessible memory. This is not a discrete GPU with dedicated VRAM; it’s unified memory shared between the CPU and GPU via the Radeon 8060S iGPU (40 RDNA 3.5 compute units). The full 96 GB allocation is unlocked by default on this SKU — no BIOS tweaks or memory reservation tricks required. This is the largest unified memory pool available in any mini PC as of mid-2025.

Memory Bandwidth: 256 GB/s

The 128 GB of LPDDR5X-8000 runs on a 256-bit bus, delivering 256 GB/s bandwidth. That’s roughly 60% of a desktop RTX 4090 (1 TB/s) but more than enough for token generation at reasonable batch sizes. For context, a Mac M2 Ultra with 192 GB unified memory peaks at 800 GB/s, but costs over $5,000. The EVO-X2’s bandwidth is a limiting factor for very large models (235B+), but for the 70B–120B range it’s well-matched to the compute throughput.

Compute: 50 TOPS (INT8)

The XDNA 2 NPU provides 50 TOPS for integer operations, but the real workhorse for LLM inference is the iGPU’s FP16/INT4 matrix units. The Radeon 8060S iGPU delivers approximately 12.5 TFLOPS (FP16) — comparable to an RTX 4060 Ti. Combined with the CPU’s 16 Zen 5 cores (boost up to 5.1 GHz), the system achieves ~50 tokens/s on 7B models and ~23 tokens/s on 20B models in real-world benchmarks (e.g., GPT-OSS-20B at 23.8 tok/s out of the box on Ubuntu).

Power Efficiency: 120W sustained / 140W peak

At 140W peak TDP, the EVO-X2 consumes less than a third of a typical gaming GPU. This makes it viable for edge deployment, home labs running 24/7, or offices where noise and heat matter. The triple-fan cooling keeps the APU under 70 °C during sustained inference, though brief throttling may occur after 15 minutes of peak load (per Mini PC Reviewer tests).

Key Specs at a Glance

APU: AMD Ryzen AI Max+ 395 (16C/32T Zen 5, Radeon 8060S 40 CUs, XDNA 2 NPU)
Memory: 128 GB LPDDR5X-8000 (256-bit, 256 GB/s)
GPU VRAM Allocation: Up to 96 GB
Storage: 2 TB PCIe 4.0 NVMe (Crucial P3)
Connectivity: USB4, Wi-Fi 7, BT 5.4, SD Card 4.0, dual LAN (2.5 GbE)
Display: Quad 8K via HDMI 2.1 / DisplayPort 1.4
OS: Windows 11 Pro (also tested with Ubuntu 24.04.3 LTS)

What Models Can It Run?

This is the critical question for any practitioner. The EVO-X2’s 96 GB unified memory and 256 GB/s bandwidth define a clear capability envelope.

Fitting Models in VRAM

Model Size	Quantization	Approx. VRAM Usage	Feasibility
7B	Q4_K_M	~5 GB	Effortless, runs at >50 tok/s
20B	Q5_K_M	~14 GB	Fast (23–30 tok/s)
70B	Q5_K_M	~48 GB	Sweet spot — fits with headroom
120B	Q4_K_M	~72 GB	Possible with effort, ~15 tok/s
235B	Sparse (50%)	~96 GB	~11 tok/s, requires sparse inference support

70B at Q5_K_M: This is the ideal workload. Models like Llama 3.1-70B, Qwen 2.5-72B, or Mistral Large 2 fit comfortably with ~48 GB VRAM usage, leaving room for context (up to 128K tokens). Expect 10–15 tok/s on a single prompt.
120B at Q4: Models such as DeepSeek-R1-120B or Mixtral 8x22B (MoE) can be run with careful context management. Real-world tests show ~15 tok/s for GPT-OSS-120B.
235B Sparse: Qwen3-235B (sparse MoE) can be loaded at ~50% density, using nearly all 96 GB. Token generation slows to ~11 tok/s — usable for batch inference or offline analysis, but not real-time chat.

Multimodal and Long-Context

The unified memory architecture excels at multimodal models (e.g., LLaVA, Qwen-VL) because image embeddings share the same pool without PCIe transfers. Long-context tasks (128K+ tokens) are feasible for models up to 70B; beyond that, context length must be reduced to avoid OOM.

Token Speed Expectations

Based on benchmarks from nishtahir.com and Mini PC Reviewer:

7B Q4_K_M: ~50–55 tok/s (prompt eval >1900 tok/s)
20B Q5_K_M: ~23 tok/s
70B Q5_K_M: ~11–14 tok/s
120B Q4: ~5–8 tok/s (prompt eval slower due to bandwidth)

For production inference servers, batch size 1 is typical; for hobbyist use, these speeds are comfortable for interactive chat on models up to 70B.

Use Cases & Target Audience

Who Should Buy the EVO-X2?

Local LLM Enthusiasts who want to run 70B models without cloud subscriptions or noisy GPU rigs. The EVO-X2 is a silent, compact alternative to a dual RTX 4090 setup.
AI Application Developers building agentic workflows that require private, low-latency inference. The 96 GB unified memory allows serving multiple models simultaneously (e.g., a 7B router + 70B generator).
Edge AI Deployments where power and space are constrained but model size cannot be compromised. At 140W peak, it can run off a standard outlet in a retail kiosk, medical device, or field lab.
Research Teams needing to fine-tune or experiment with 70B–120B models on a single machine. While not designed for training (no discrete GPU), the EVO-X2 can handle LoRA fine-tuning on small batches.

Who Should Look Elsewhere?

High-throughput inference servers serving hundreds of concurrent users — you need multiple GPUs with higher bandwidth.
Training large models from scratch — the iGPU’s 12.5 TFLOPS is insufficient for pre-training.
Gamers — the Radeon 8060S can handle 1440p at medium settings, but a discrete GPU is better for high-refresh gaming.

How It Compares

vs. Mac Studio (M2 Ultra, 192 GB)

Price: EVO-X2 $1,999 vs. Mac Studio ~$5,000+ for 192 GB.
VRAM: 96 GB vs. 192 GB — Mac Studio wins for 120B+ models.
Bandwidth: 256 GB/s vs. 800 GB/s — Mac Studio is faster for token generation.
Software: EVO-X2 runs Windows/Linux with full CUDA/ROCm compatibility via Vulkan/LLVM; Mac Studio limited to Metal and Apple’s stack.
Pick EVO-X2 if: you need Linux-native tooling (llama.cpp, vLLM, Ollama) and a lower budget.

vs. NVIDIA RTX 4090 (24 GB) in a desktop

VRAM: 96 GB vs. 24 GB — EVO-X2 can run 70B models that a 4090 cannot fit.
Speed: 4090 generates tokens ~2–3× faster for models that fit (e.g., 7B at 150 tok/s).
Power: 140W vs. 450W — EVO-X2 is far more efficient.
Pick EVO-X2 if: your models exceed 24 GB VRAM, or you need a silent, compact system.

vs. Apple Mac Mini M4 Pro (64 GB)

VRAM: 96 GB vs. 64 GB — EVO-X2 supports larger models.
Bandwidth: 256 GB/s vs. 273 GB/s — comparable.
Price: $1,999 vs. $1,399 — Mac Mini is cheaper for smaller models.
Pick EVO-X2 if: you need to run 70B+ models or prefer x86 ecosystem.

The GMKtec EVO-X2 is currently the most cost-effective way to get 96 GB of unified memory for AI inference in a compact, low-power package. It fills a gap between consumer laptops and enterprise workstations, making large local models accessible to individual developers and small teams.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	AA	38.3 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	AA	36.4 tok/s	5.7 GB
Llama 2 7B ChatMeta	7B	AA	43.0 tok/s	4.8 GB
Carnice-9b for Hermes agentkai-os	9B	AA	34.3 tok/s	6.0 GB
Gemma 4 E2B ITGoogle	2B	AA	55.6 tok/s	3.7 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	AA	24.2 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	AA	24.2 tok/s	8.5 GB
Mistral 7B InstructMistral AI	7B	AA	32.2 tok/s	6.4 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	AA	24.3 tok/s	8.5 GB
Gemma 4 E4B ITGoogle	4B	BB	29.8 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	BB	29.8 tok/s	6.9 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	18.1 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	18.7 tok/s	11.0 GB
GLM-4.5Z.ai	355B(32B active)	BB	4.0 tok/s	51.8 GB
Kimi K2 InstructMoonshot AI	1000B(32B active)	BB	4.0 tok/s	51.8 GB
Llama 3 70B InstructMeta	70B	BB	4.5 tok/s	45.7 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
GLM-4.7Z.ai	358B(32B active)	BB	3.9 tok/s	52.6 GB
Qwen3.5-397B-A17BAlibaba Cloud (Qwen)	397B(17B active)	BB	4.5 tok/s	46.0 GB
Qwen 3.5 OmniAlibaba Cloud	397B(17B active)	BB	4.6 tok/s	45.2 GB
Llama 2 70B ChatMeta	70B	BB	4.7 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	BB	4.7 tok/s	43.6 GB
DeepSeek-V3DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
DeepSeek-R1DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
DeepSeek-V3.2DeepSeek	685B(37B active)	BB	3.4 tok/s	59.8 GB

Rows per page

Page 1 of 3

GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB)

AI PCs & LaptopsIn Stock

Best for LLMsEdge AIPremium / High-EndEnergy Efficient

Buy on Amazon$1,999Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM96 GB

INT850 TOPS

TDP140 W

Memory BW256 GB/s

Max Params70B at Q5_K_M; 120B at Q4 with effort; 235B sparse at ~11 tok/s

Form FactorMini PC

APUAMD Ryzen AI Max+ 395 (16C/32T Zen 5, Radeon 8060S 40 RDNA 3.5 CUs, XDNA 2 NPU 50 TOPS)

Boost ClockUp to 5.1 GHz

iGPU VRAM AllocationUp to 96GB unified

Memory128GB LPDDR5X-8000 unified (256-bit, 256 GB/s, shared with GPU)

Storage2TB PCIe 4.0 NVMe SSD (Crucial P3)

Display OutputQuad 8K

ConnectivityUSB4, WiFi 7, BT 5.4, SD Card 4.0

TDP120W sustained / 140W peak

AwardAmerican Good Design Platinum 2025

OSWindows 11

Our Take

Best for: Datacenter inference for flagship dense models

Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.

Pair this withKimi K2 Instruct (1000B)Largest popular open model that fits at Q4 — needs roughly 51.8 GB on this 96 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

AI Performance & Specifications

VRAM: 96 GB Unified

Memory Bandwidth: 256 GB/s

Compute: 50 TOPS (INT8)

Power Efficiency: 120W sustained / 140W peak

Key Specs at a Glance

APU: AMD Ryzen AI Max+ 395 (16C/32T Zen 5, Radeon 8060S 40 CUs, XDNA 2 NPU)
Memory: 128 GB LPDDR5X-8000 (256-bit, 256 GB/s)
GPU VRAM Allocation: Up to 96 GB
Storage: 2 TB PCIe 4.0 NVMe (Crucial P3)
Connectivity: USB4, Wi-Fi 7, BT 5.4, SD Card 4.0, dual LAN (2.5 GbE)
Display: Quad 8K via HDMI 2.1 / DisplayPort 1.4
OS: Windows 11 Pro (also tested with Ubuntu 24.04.3 LTS)

What Models Can It Run?

This is the critical question for any practitioner. The EVO-X2’s 96 GB unified memory and 256 GB/s bandwidth define a clear capability envelope.

Fitting Models in VRAM

Model Size	Quantization	Approx. VRAM Usage	Feasibility
7B	Q4_K_M	~5 GB	Effortless, runs at >50 tok/s
20B	Q5_K_M	~14 GB	Fast (23–30 tok/s)
70B	Q5_K_M	~48 GB	Sweet spot — fits with headroom
120B	Q4_K_M	~72 GB	Possible with effort, ~15 tok/s
235B	Sparse (50%)	~96 GB	~11 tok/s, requires sparse inference support

70B at Q5_K_M: This is the ideal workload. Models like Llama 3.1-70B, Qwen 2.5-72B, or Mistral Large 2 fit comfortably with ~48 GB VRAM usage, leaving room for context (up to 128K tokens). Expect 10–15 tok/s on a single prompt.
120B at Q4: Models such as DeepSeek-R1-120B or Mixtral 8x22B (MoE) can be run with careful context management. Real-world tests show ~15 tok/s for GPT-OSS-120B.
235B Sparse: Qwen3-235B (sparse MoE) can be loaded at ~50% density, using nearly all 96 GB. Token generation slows to ~11 tok/s — usable for batch inference or offline analysis, but not real-time chat.

Multimodal and Long-Context

Token Speed Expectations

Based on benchmarks from nishtahir.com and Mini PC Reviewer:

7B Q4_K_M: ~50–55 tok/s (prompt eval >1900 tok/s)
20B Q5_K_M: ~23 tok/s
70B Q5_K_M: ~11–14 tok/s
120B Q4: ~5–8 tok/s (prompt eval slower due to bandwidth)

For production inference servers, batch size 1 is typical; for hobbyist use, these speeds are comfortable for interactive chat on models up to 70B.

Use Cases & Target Audience

Who Should Buy the EVO-X2?

Local LLM Enthusiasts who want to run 70B models without cloud subscriptions or noisy GPU rigs. The EVO-X2 is a silent, compact alternative to a dual RTX 4090 setup.
AI Application Developers building agentic workflows that require private, low-latency inference. The 96 GB unified memory allows serving multiple models simultaneously (e.g., a 7B router + 70B generator).
Edge AI Deployments where power and space are constrained but model size cannot be compromised. At 140W peak, it can run off a standard outlet in a retail kiosk, medical device, or field lab.
Research Teams needing to fine-tune or experiment with 70B–120B models on a single machine. While not designed for training (no discrete GPU), the EVO-X2 can handle LoRA fine-tuning on small batches.

Who Should Look Elsewhere?

High-throughput inference servers serving hundreds of concurrent users — you need multiple GPUs with higher bandwidth.
Training large models from scratch — the iGPU’s 12.5 TFLOPS is insufficient for pre-training.
Gamers — the Radeon 8060S can handle 1440p at medium settings, but a discrete GPU is better for high-refresh gaming.

How It Compares

vs. Mac Studio (M2 Ultra, 192 GB)

Price: EVO-X2 $1,999 vs. Mac Studio ~$5,000+ for 192 GB.
VRAM: 96 GB vs. 192 GB — Mac Studio wins for 120B+ models.
Bandwidth: 256 GB/s vs. 800 GB/s — Mac Studio is faster for token generation.
Software: EVO-X2 runs Windows/Linux with full CUDA/ROCm compatibility via Vulkan/LLVM; Mac Studio limited to Metal and Apple’s stack.
Pick EVO-X2 if: you need Linux-native tooling (llama.cpp, vLLM, Ollama) and a lower budget.

vs. NVIDIA RTX 4090 (24 GB) in a desktop

VRAM: 96 GB vs. 24 GB — EVO-X2 can run 70B models that a 4090 cannot fit.
Speed: 4090 generates tokens ~2–3× faster for models that fit (e.g., 7B at 150 tok/s).
Power: 140W vs. 450W — EVO-X2 is far more efficient.
Pick EVO-X2 if: your models exceed 24 GB VRAM, or you need a silent, compact system.

vs. Apple Mac Mini M4 Pro (64 GB)

VRAM: 96 GB vs. 64 GB — EVO-X2 supports larger models.
Bandwidth: 256 GB/s vs. 273 GB/s — comparable.
Price: $1,999 vs. $1,399 — Mac Mini is cheaper for smaller models.
Pick EVO-X2 if: you need to run 70B+ models or prefer x86 ecosystem.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	AA	38.3 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	AA	36.4 tok/s	5.7 GB
Llama 2 7B ChatMeta	7B	AA	43.0 tok/s	4.8 GB
Carnice-9b for Hermes agentkai-os	9B	AA	34.3 tok/s	6.0 GB
Gemma 4 E2B ITGoogle	2B	AA	55.6 tok/s	3.7 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	AA	24.2 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	AA	24.2 tok/s	8.5 GB
Mistral 7B InstructMistral AI	7B	AA	32.2 tok/s	6.4 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	AA	24.3 tok/s	8.5 GB
Gemma 4 E4B ITGoogle	4B	BB	29.8 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	BB	29.8 tok/s	6.9 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	18.1 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	18.7 tok/s	11.0 GB
GLM-4.5Z.ai	355B(32B active)	BB	4.0 tok/s	51.8 GB
Kimi K2 InstructMoonshot AI	1000B(32B active)	BB	4.0 tok/s	51.8 GB
Llama 3 70B InstructMeta	70B	BB	4.5 tok/s	45.7 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
GLM-4.7Z.ai	358B(32B active)	BB	3.9 tok/s	52.6 GB
Qwen3.5-397B-A17BAlibaba Cloud (Qwen)	397B(17B active)	BB	4.5 tok/s	46.0 GB
Qwen 3.5 OmniAlibaba Cloud	397B(17B active)	BB	4.6 tok/s	45.2 GB
Llama 2 70B ChatMeta	70B	BB	4.7 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	BB	4.7 tok/s	43.6 GB
DeepSeek-V3DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
DeepSeek-R1DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	BB	3.4 tok/s	59.8 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
DeepSeek-V3.2DeepSeek	685B(37B active)	BB	3.4 tok/s	59.8 GB

Rows per page

Page 1 of 3

GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB)

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM: 96 GB Unified

Memory Bandwidth: 256 GB/s

Compute: 50 TOPS (INT8)

Power Efficiency: 120W sustained / 140W peak

Key Specs at a Glance

What Models Can It Run?

Fitting Models in VRAM

Multimodal and Long-Context

Token Speed Expectations

Use Cases & Target Audience

Who Should Buy the EVO-X2?

Who Should Look Elsewhere?

How It Compares

vs. Mac Studio (M2 Ultra, 192 GB)

vs. NVIDIA RTX 4090 (24 GB) in a desktop

vs. Apple Mac Mini M4 Pro (64 GB)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)

GMKtec EVO-X2 (Ryzen AI Max+ 395 128GB)

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM: 96 GB Unified

Memory Bandwidth: 256 GB/s

Compute: 50 TOPS (INT8)

Power Efficiency: 120W sustained / 140W peak

Key Specs at a Glance

What Models Can It Run?

Fitting Models in VRAM

Multimodal and Long-Context

Token Speed Expectations

Use Cases & Target Audience

Who Should Buy the EVO-X2?

Who Should Look Elsewhere?

How It Compares

vs. Mac Studio (M2 Ultra, 192 GB)

vs. NVIDIA RTX 4090 (24 GB) in a desktop

vs. Apple Mac Mini M4 Pro (64 GB)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)