Mid-tower AI workstation with RTX 5090 32GB and Ryzen 9 9950X. Corsair-cooled, 6TB NVMe, ready for local inference out of the box. Supports up to 4 GPUs, backed by Origin PC lifetime support.
A 70B Q4 quant fits with usable context budget left over. Sweet spot if you want a single card that handles every open model worth running locally today. High TDP — plan for adequate cooling and a beefy PSU; not the right pick for compact desktops.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Origin PC M-CLASS v2 is a mid-tower AI workstation engineered for practitioners who need to run large language models locally without compromise. Origin PC, a US-based boutique builder with a reputation for lifetime support and hand-built quality, targets this machine squarely at the prosumer and professional AI market. It’s not a data-center server, but it delivers data-center-grade inference throughput in a desk-friendly chassis.
At $6,379, the M-CLASS v2 sits in the premium tier of AI PCs and laptops. It competes directly with high-end custom builds, Lambda Labs’ pre-configured workstations, and the upper echelon of Apple’s Mac Studio with M2 Ultra. Its defining advantage is the NVIDIA GeForce RTX 5090 with 32 GB of GDDR7 VRAM and a massive 1792 GB/s memory bandwidth—numbers that put it ahead of any consumer GPU currently available for local inference. Backed by a 16-core AMD Ryzen 9 9950X, 64 GB of DDR5-6000 RAM, and 6 TB of NVMe storage, this machine is ready to pull models from Hugging Face and start generating tokens out of the box.
The RTX 5090’s 32 GB VRAM is the headline spec. It enables you to load 70B-parameter models at Q4 quantization comfortably, with headroom for context windows up to 128K tokens. For smaller models, you can run multiple instances for agentic workflows or batch inference. The 32 GB buffer also handles multimodal models (e.g., LLaVA, Qwen-VL) without swapping to system RAM.
Token generation speed is bottlenecked by memory bandwidth, not compute. At 1792 GB/s, the RTX 5090 delivers approximately 1.8x the bandwidth of an RTX 4090 (1008 GB/s). In practice, this translates to roughly 80–120 tokens per second for a 7B model at Q4, and 20–40 tokens per second for a 70B model at Q4. These are real-time, interactive speeds—not batch throughput.
For inference, FP16 TFLOPS matter less than bandwidth, but they become relevant when you run speculative decoding or need to process long context with flash attention. 105 TFLOPS is more than adequate for all current open-weight models. For fine-tuning small adapters (LoRA/QLoRA), this GPU can handle modest training workloads, though it’s not a training-first machine.
The system draws up to 750 W under full GPU load, supplied by a 1200 W Corsair RM1200x SHIFT PSU. The Corsair iCUE LINK TITAN 360 AIO keeps the Ryzen 9 9950X cool during sustained all-core workloads. The chassis is a Sliger M-CLASS full-tower (7.6" x 18.3" x 18.3") with support for up to four GPUs—meaning you could add three more RTX 5090s for 128 GB total VRAM, scaling to 300B+ models.
| Component | Detail |
|---|---|
| GPU | NVIDIA GeForce RTX 5090 32 GB GDDR7 |
| VRAM | 32 GB |
| Memory Bandwidth | 1792 GB/s |
| FP16 Performance | 105 TFLOPS |
| CPU | AMD Ryzen 9 9950X (16C/32T, up to 5.7 GHz) |
| RAM | 64 GB DDR5-6000 (2x32 GB Corsair Vengeance) |
| Storage | 2 TB PCIe 5.0 (OS) + 4 TB PCIe 4.0 (Data) |
| PSU | 1200 W 80+ Gold |
| Cooling | 360 mm AIO (CPU) |
| GPU Expansion | Up to 4 GPUs (additional cards not included) |
| Warranty | Lifetime labor, 2-year parts |
This is where the M-CLASS v2 earns its keep. Here’s a breakdown by model size and quantization.
The 32 GB VRAM handles LLaVA-NeXT-34B (Q4, ~22 GB) with room for image inputs. For Qwen2-VL-72B, you’ll need Q3 or lower. Long-context tasks (e.g., 128K tokens with Llama 3.1 70B) are viable with Flash Attention v2 and the high bandwidth.
If you’re iterating on prompts, testing RAG pipelines, or evaluating model behavior, the M-CLASS v2 eliminates cloud latency and API costs. You can swap between models in seconds, run ablation studies locally, and keep your data private.
For teams building agentic workflows (e.g., LangChain, CrewAI, AutoGen), this machine can host multiple models simultaneously. Run a 7B agent for routing, a 32B for reasoning, and a vision model for image analysis—all on one GPU with CUDA graphs and continuous batching.
If you want a ChatGPT-level experience without subscriptions or data leaving your machine, the M-CLASS v2 delivers. You can run Llama 3.1 70B at Q4 with sub-second latency for single-turn responses.
With the ability to add up to three more RTX 5090s, the M-CLASS v2 scales to a 128 GB VRAM inference server. That’s enough to serve a 70B model with multiple concurrent users using vLLM or llama.cpp’s server mode.
This is not a training workstation. 32 GB VRAM is enough for QLoRA fine-tuning of 7B–13B models, but full fine-tuning of 70B is impractical. For training, look at multi-GPU configurations or cloud instances.
Building your own RTX 5090 system would cost roughly $5,500–$6,000 in parts (if you can find the GPU at MSRP). The M-CLASS v2’s $6,379 price includes professional assembly, cable management, stress testing, and lifetime labor support. For practitioners who value time over a few hundred dollars, the pre-built warranty and support are worth the premium.
Lambda Labs offers workstations with NVIDIA A10 (24 GB) or A100 (40 GB) GPUs starting around $8,000. While the A100 has 40 GB VRAM and higher FP16 compute (312 TFLOPS), its memory bandwidth is only 1555 GB/s (vs. 1792 on the RTX 5090). For inference, the RTX 5090 is often faster. The A100 pulls ahead for training. The M-CLASS v2 is the better value for pure inference and agentic workflows.
The Mac Studio with 192 GB unified memory can run 70B models at Q8 without offloading, but its memory bandwidth (800 GB/s) is less than half the RTX 5090’s. Token generation speeds are 2–3x slower. The M-CLASS v2 wins on raw throughput and compatibility with CUDA-based frameworks (vLLM, TensorRT-LLM, llama.cpp). The Mac Studio wins on memory capacity for very large models and power efficiency.
When to pick the M-CLASS v2: You need high tokens-per-second for interactive use, you rely on CUDA-accelerated libraries, and you want the option to scale to multiple GPUs later.
minimax-m2.5MiniMax | 230B(10B active) | SS | 63.5 tok/s | 22.7 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | SS | 126.9 tok/s | 11.4 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | SS | 59.2 tok/s | 24.4 GB | |
Qwen3.5-9BAlibaba Cloud (Qwen) | 9B | SS | 58.7 tok/s | 24.6 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | SS | 131.0 tok/s | 11.0 GB | |
| 8B | SS | 108.2 tok/s | 13.3 GB | ||
Qwen3.6 35B-A3BAlibaba Cloud | 35B(3B active) | SS | 169.1 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba Cloud (Qwen) | 35B(3B active) | SS | 169.1 tok/s | 8.5 GB | |
| Ad | |||||
Llama 2 13B ChatMeta | 13B | SS | 170.4 tok/s | 8.5 GB | |
Qwen3.5-122B-A10BAlibaba Cloud (Qwen) | 122B(10B active) | SS | 52.9 tok/s | 27.3 GB | |
Qwen3-30B-A3BAlibaba Cloud (Qwen) | 30B(3B active) | SS | 267.8 tok/s | 5.4 GB | |
| 9B | SS | 239.8 tok/s | 6.0 GB | ||
| 8B | AA | 254.7 tok/s | 5.7 GB | ||
Gemma 4 E4B ITGoogle | 4B | AA | 208.6 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | AA | 208.6 tok/s | 6.9 GB | |
Mistral 7B InstructMistral AI | 7B | AA | 225.6 tok/s | 6.4 GB | |
| Ad | |||||
Llama 2 7B ChatMeta | 7B | AA | 301.2 tok/s | 4.8 GB | |
Gemma 4 E2B ITGoogle | 2B | AA | 389.0 tok/s | 3.7 GB | |
Mistral Small 3 24BMistral AI | 24B | FF | 37.0 tok/s | 39.0 GB | |
Qwen3.6-27BAlibaba Cloud | 27B | FF | 19.8 tok/s | 72.8 GB | |
Gemma 3 27B ITGoogle | 27B | FF | 32.9 tok/s | 43.8 GB | |
Qwen3.5-27BAlibaba Cloud (Qwen) | 27B | FF | 19.8 tok/s | 72.8 GB | |
Gemma 4 31B ITGoogle | 31B | FF | 17.6 tok/s | 82.0 GB | |
Qwen3-32BAlibaba Cloud (Qwen) | 32.8B | FF | 26.8 tok/s | 53.9 GB | |
| Ad | |||||
LLaMA 65BMeta | 65B | FF | 36.7 tok/s | 39.3 GB | |