Ultra-compact 4.4L workstation with AMD Ryzen AI Max 385 APU. 48GB unified VRAM and 64GB LPDDR5X-8000 in a silent, 300W form factor. XDNA 2 NPU adds 50 TOPS for on-device acceleration.
The first tier where 70B-class models stop feeling cramped. Headroom for KV cache means 32K+ context on Q4 quants without falling off the GPU.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Corsair AI Workstation 300 (Ryzen AI Max 385) is a purpose-built, ultra-compact local AI inference machine. At $1,699, it occupies a specific niche: a prosumer-grade system that eliminates the need for a discrete GPU by leveraging AMD’s unified memory architecture. Corsair, through its distributor Origin PC, targets developers and engineers who need to run large language models (LLMs) on-device without the bulk, noise, or power draw of a traditional workstation.
This is not a data center server nor a consumer gaming PC. It’s a 4.4-liter small form factor (SFF) system that fits on a desk, runs silently at ~150W, and provides 48GB of unified VRAM for AI workloads. Its primary competition includes other high-end mini PCs with integrated AI acceleration (e.g., the Apple Mac Studio with M4 Max) and entry-level discrete GPU workstations (e.g., a custom build with an RTX 4060 Ti 16GB). The Corsair AI Workstation 300 wins on VRAM capacity per dollar and power efficiency, but it is not a general-purpose compute node for training.
The core of this system is the AMD Ryzen AI Max 385 APU, an 8-core/16-thread processor paired with a Radeon 8050S integrated GPU and an XDNA 2 NPU. For AI inference, the critical spec is the unified memory architecture: 64GB of LPDDR5X-8000 memory is shared between CPU and GPU, with up to 48GB dynamically allocated as VRAM. This provides 256 GB/s of memory bandwidth, which directly determines token generation speed.
Key AI-specific specs:
The 256 GB/s bandwidth is the bottleneck for inference throughput. For comparison, an RTX 4090 offers ~1,000 GB/s, and an RTX 4060 Ti offers ~288 GB/s. This means the Corsair AI Workstation 300 will generate tokens slower than a discrete GPU solution, but it compensates with substantially more VRAM. The 50 TOPS NPU provides on-device acceleration for lightweight models and preprocessing, but the Radeon 8050S iGPU handles the bulk of LLM inference via ROCm or DirectML.
The 150W typical power draw is a significant advantage. This system can run 24/7 for edge deployments or local agentic workflows without special cooling or high electricity costs. The passive cooling design keeps noise to a minimum.
The 48GB VRAM ceiling defines the model compatibility. Here is a realistic breakdown of what fits and what doesn’t:
Fits comfortably (high quantization):
Fits with quantization (sweet spot for quality/speed):
At the limit:
Multimodal models:
The sweet spot for this hardware is 32B parameter models at Q5_K_M. This provides a good balance of reasoning capability, context length (8-16K tokens), and inference speed. For 70B models, you must accept Q3 quantization and limited context, making this hardware suitable for specific tasks like summarization or classification but not for complex multi-turn agents.
The Corsair AI Workstation 300 is not for everyone. It serves specific, well-defined use cases:
Local LLM Inference for Developers: If you are building AI-powered applications that require local inference for privacy, latency, or offline capability, this system provides enough VRAM to run 32B models at usable speeds. It is ideal for prototyping agentic workflows, RAG pipelines, or local chatbots without cloud costs.
Edge AI Deployment: The 4.4L form factor, 150W power draw, and silent operation make this suitable for edge deployments in labs, clinics, or industrial settings where space and noise are constrained. The XDNA 2 NPU adds 50 TOPS for lightweight on-device acceleration (e.g., Whisper for speech-to-text, YOLO for object detection).
Hobbyists and Researchers Running Local Agents: For practitioners who need to run multiple models simultaneously or maintain long-running inference servers, the unified memory allows swapping between models without VRAM contention. A single system can serve a local LLM API endpoint (e.g., llama.cpp, Ollama) while running a separate embedding model for RAG.
Not suitable for: Training from scratch, fine-tuning large models, or running models larger than 70B. The 256 GB/s bandwidth limits throughput for real-time applications like voice assistants. If you need high tokens/second for production inference, a discrete GPU workstation is a better choice.
vs. Apple Mac Studio (M4 Max, 48GB unified memory):
The Mac Studio offers similar VRAM capacity and higher memory bandwidth (400 GB/s vs. 256 GB/s). It runs Llama 3.1 70B at Q3 faster (10-15 tokens/second vs. 5-10). However, the Corsair AI Workstation 300 runs Windows natively, supports ROCm and DirectML, and is easier to integrate into existing x86-based development pipelines. The Mac Studio is better for macOS-native workflows and creative apps; the Corsair is better for Windows/Linux AI development and edge deployment.
vs. Custom Mini PC with RTX 4060 Ti 16GB:
A custom SFF build with an RTX 4060 Ti (16GB VRAM) costs ~$1,200-1,400 and offers higher memory bandwidth (288 GB/s). It will run 7B-13B models faster (50+ tokens/second) but cannot load anything larger than 16GB. The Corsair AI Workstation 300 wins on VRAM capacity—you can run 32B models that the RTX 4060 Ti simply cannot. For developers who need to experiment with larger models, the Corsair is the better buy. For high-throughput inference on small models, the discrete GPU build is faster.
When to pick the Corsair AI Workstation 300: You need to run 32B-70B parameter models locally, you prioritize VRAM capacity over raw speed, and you want a silent, low-power system that fits on a desk. It is the best hardware for local AI agents in 2026 if your workflow requires models larger than 16GB.
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 38.3 tok/s | 5.4 GB | |
| 8B | AA | 36.4 tok/s | 5.7 GB | ||
Llama 2 7B ChatMeta | 7B | AA | 43.0 tok/s | 4.8 GB | |
| 9B | AA | 34.3 tok/s | 6.0 GB | ||
Gemma 4 E2B ITGoogle | 2B | AA | 55.6 tok/s | 3.7 GB | |
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | AA | 24.2 tok/s | 8.5 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 32.2 tok/s | 6.4 GB | |
| Ad | |||||
Llama 2 13B ChatMeta | 13B | AA | 24.3 tok/s | 8.5 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | AA | 18.1 tok/s | 11.4 GB | |
Gemma 4 E4B ITGoogle | 4B | AA | 29.8 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | AA | 29.8 tok/s | 6.9 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | AA | 18.7 tok/s | 11.0 GB | |
minimax-m2.5MiniMax | 230B(10B active) | BB | 9.1 tok/s | 22.7 GB | |
Qwen3.5-122B-A10BAlibaba Cloud (Qwen) | 122B(10B active) | BB | 7.6 tok/s | 27.3 GB | |
Qwen3-235B-A22BAlibaba Cloud (Qwen) | 235B(22B active) | BB | 5.7 tok/s | 36.3 GB | |
| Ad | |||||
| 8B | BB | 15.5 tok/s | 13.3 GB | ||
Falcon 40B InstructTechnology Innovation Institute | 40B | BB | 8.5 tok/s | 24.4 GB | |
Qwen3.5-9BAlibaba Cloud (Qwen) | 9B | BB | 8.4 tok/s | 24.6 GB | |
Mistral Small 3 24BMistral AI | 24B | BB | 5.3 tok/s | 39.0 GB | |
LLaMA 65BMeta | 65B | BB | 5.2 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | CC | 4.7 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | CC | 4.7 tok/s | 43.6 GB | |
Qwen 3.5 OmniAlibaba Cloud | 397B(17B active) | CC | 4.6 tok/s | 45.2 GB | |
| Ad | |||||
| 70B | CC | 4.5 tok/s | 45.7 GB | ||