Top-spec 4.4L workstation with AMD Ryzen AI Max+ 395 APU. 96GB unified VRAM and 128GB LPDDR5X-8000 — runs 70B models locally in near-silence. The closest thing to a Mac Studio in the x86 world.
Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Corsair AI Workstation 300 (Ryzen AI Max+ 395) is a purpose-built, ultra-compact AI workstation from Corsair—a company better known for high-end PC components, now delivering a focused machine for local inference. At $3,399, it targets the prosumer and edge AI market, directly competing with Apple’s Mac Studio (M2 Ultra) and high-end mini PCs like the Intel NUC 13 Extreme. What sets it apart is the AMD Ryzen AI Max+ 395 “Strix Halo” APU, which combines a 16-core CPU, Radeon 8060S iGPU, and XDNA 2 NPU into a single die, sharing 128GB of unified LPDDR5X-8000 memory. The result: a 4.4-liter chassis that can run 70B parameter models at Q5_K_M quantization in near-silence, with zero reliance on discrete GPUs or cloud infrastructure.
This isn’t a gaming rig with RGB. It’s a silent, power-efficient inference engine built for developers who need to run large language models locally—without the noise, heat, or cost of a multi-GPU tower. Corsair has partnered with Origin PC for lifetime support, which adds confidence for teams deploying this in production or research settings.
The specs that matter for AI inference are straightforward:
The unified memory architecture is the key differentiator. Unlike discrete GPU setups where VRAM is capped at 24 GB (RTX 4090) or 48 GB (RTX 6000 Ada), the Corsair AI Workstation 300 gives the GPU access to up to 96 GB of the total 128 GB pool. This is a game-changer for local LLM inference: you can load models that would otherwise require a multi-GPU server.
Memory bandwidth of 512 GB/s is roughly equivalent to an RTX 3090 (935 GB/s) halved, but the trade-off is capacity. For inference, bandwidth directly impacts token generation speed. Expect ~15-25 tokens/second on a 70B Q5_K_M model—similar to a single RTX 4090 at lower quantization, but with far more headroom for larger models.
The NPU adds 50 TOPS for lightweight on-device AI tasks (e.g., speech recognition, real-time transcription), but the heavy lifting for LLMs is done by the iGPU via ROCm. Power consumption is remarkably low: 150W typical means this machine can run 24/7 without thermal throttling or fan noise that would disturb an office or lab.
This is where the Corsair AI Workstation 300 justifies its price. The 96 GB of VRAM enables local inference of models that most consumer hardware can’t touch:
The sweet spot for quality-to-speed is Llama 3.1 70B at Q5_K_M. That’s the default recommendation for anyone who wants high-quality chat, code generation, or reasoning without compromising latency. For faster responses on smaller models (7B-13B), you’ll get 50+ tokens/second—useful for real-time agent loops.
Note: This hardware is not designed for training. The iGPU lacks tensor cores optimized for backpropagation. For fine-tuning, you’d need a discrete GPU (e.g., RTX 4090 or A6000). For inference, it’s a beast.
Who should buy this?
What it’s not for: Training or fine-tuning large models. For that, look at a workstation with NVIDIA RTX 6000 Ada or A100. Also not ideal if you need >120B parameter models at high precision—you’ll need a server with multiple GPUs.
vs. Apple Mac Studio (M2 Ultra, 192 GB unified memory): The Mac Studio offers more total memory (192 GB) and higher bandwidth (800 GB/s), but its GPU is less compatible with mainstream AI frameworks. Many tools (vLLM, llama.cpp, Ollama) have native support for AMD ROCm, while Apple’s Metal backend lags in performance and model support. The Corsair AI Workstation 300 is cheaper ($3,399 vs. $6,999 for 192 GB M2 Ultra) and runs Windows/Linux natively, making it easier to integrate with existing development pipelines. Pick the Mac Studio if you’re already in the Apple ecosystem and need the absolute largest VRAM for 120B+ models.
vs. NVIDIA RTX 4090 (24 GB VRAM): A single RTX 4090 costs ~$1,600 and can run 70B models only at Q2 or Q3—low quality. To match 96 GB, you’d need four RTX 4090s ($6,400) plus a motherboard, PSU, and cooling. The Corsair workstation is cheaper, smaller, and quieter. Pick the RTX 4090 if you also game or need CUDA-optimized training.
vs. Intel NUC 13 Extreme (i9-13900K + RTX 4060): The NUC is a general-purpose mini PC. It can run small models (7B-13B) but lacks the VRAM for 70B. The Corsair workstation is specifically optimized for AI, with unified memory that makes it the closest x86 equivalent to a Mac Studio. Pick the NUC if you need a compact general-purpose machine and don’t require large model inference.
Bottom line: The Corsair AI Workstation 300 (Ryzen AI Max+ 395) is the best AI PC for local LLM inference under $4,000. It fills the gap between consumer GPUs and data-center servers, offering 96 GB of VRAM in a silent, 150W package. For developers who need to run 70B models locally—whether for privacy, latency, or cost—this is the hardware to buy.
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | SS | 48.3 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | SS | 48.3 tok/s | 8.5 GB | |
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 76.5 tok/s | 5.4 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | SS | 36.3 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | AA | 37.4 tok/s | 11.0 GB | |
Llama 2 13B ChatMeta | 13B | AA | 48.7 tok/s | 8.5 GB | |
| 9B | AA | 68.5 tok/s | 6.0 GB | ||
| 8B | AA | 72.8 tok/s | 5.7 GB | ||
| Ad | |||||
Gemma 4 E4B ITGoogle | 4B | AA | 59.6 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | AA | 59.6 tok/s | 6.9 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 64.5 tok/s | 6.4 GB | |
Llama 2 7B ChatMeta | 7B | AA | 86.1 tok/s | 4.8 GB | |
| 8B | AA | 30.9 tok/s | 13.3 GB | ||
Gemma 4 E2B ITGoogle | 2B | AA | 111.2 tok/s | 3.7 GB | |
minimax-m2.5MiniMax | 230B(10B active) | AA | 18.2 tok/s | 22.7 GB | |
Qwen3.5-122B-A10BAlibaba Cloud (Qwen) | 122B(10B active) | AA | 15.1 tok/s | 27.3 GB | |
| Ad | |||||
| 70B | BB | 9.0 tok/s | 45.7 GB | ||
Qwen 3.5 OmniAlibaba Cloud | 397B(17B active) | BB | 9.1 tok/s | 45.2 GB | |
Qwen3.5-397B-A17BAlibaba Cloud (Qwen) | 397B(17B active) | BB | 9.0 tok/s | 46.0 GB | |
Llama 2 70B ChatMeta | 70B | BB | 9.5 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | BB | 9.5 tok/s | 43.6 GB | |
Qwen3-235B-A22BAlibaba Cloud (Qwen) | 235B(22B active) | BB | 11.3 tok/s | 36.3 GB | |
GLM-4.5Z.ai | 355B(32B active) | BB | 8.0 tok/s | 51.8 GB | |
Kimi K2 InstructMoonshot AI | 1000B(32B active) | BB | 8.0 tok/s | 51.8 GB | |
| Ad | |||||
GLM-4.7Z.ai | 358B(32B active) | BB | 7.8 tok/s | 52.6 GB | |