No image

Origin PC

Origin PC M-CLASS v2

Name: Origin PC M-CLASS v2
Brand: Origin PC
Price: 6379 USD
Availability: InStock

Mid-tower AI workstation with RTX 5090 32GB and Ryzen 9 9950X. Corsair-cooled, 6TB NVMe, ready for local inference out of the box. Supports up to 4 GPUs, backed by Origin PC lifetime support.

AI PCs & LaptopsIn Stock

Best for LLMsPremium / High-End

Buy on Amazon$6,379Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM32 GB

FP16105 TFLOPS

TDP750 W

Memory BW1792 GB/s

Max Params32B at Q5_K_M; 70B at Q4

Form FactorMid Tower

GPUNVIDIA GeForce RTX 5090 32GB GDDR7 (1792 GB/s)

CPUAMD Ryzen 9 9950X (16C/32T, 4.3 / 5.7 GHz boost)

MotherboardASUS ProArt X870E-Creator WiFi

Memory64GB DDR5-6000 (2x32GB Corsair Vengeance)

Storage2TB Corsair MP700 Elite PCIe 5.0 (OS) + 4TB Corsair MP600 Core XT PCIe 4.0 (Data)

Power SupplyCorsair RM1200x SHIFT (1200W, 80+ Gold)

CoolingCorsair iCUE LINK TITAN 360 RX RGB AIO

GPU ExpansionUp to 4 GPUs

WarrantyLifetime labor, 2-year parts replacement

Our Take

Best for: Comfortable home for 70B at Q4

A 70B Q4 quant fits with usable context budget left over. Sweet spot if you want a single card that handles every open model worth running locally today. High TDP — plan for adequate cooling and a beefy PSU; not the right pick for compact desktops.

Pair this withminimax-m2.5 (230B)Largest popular open model that fits at Q4 — needs roughly 22.7 GB on this 32 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

The Origin PC M-CLASS v2 is a mid-tower AI workstation engineered for practitioners who need to run large language models locally without compromise. Origin PC, a US-based boutique builder with a reputation for lifetime support and hand-built quality, targets this machine squarely at the prosumer and professional AI market. It’s not a data-center server, but it delivers data-center-grade inference throughput in a desk-friendly chassis.

At $6,379, the M-CLASS v2 sits in the premium tier of AI PCs and laptops. It competes directly with high-end custom builds, Lambda Labs’ pre-configured workstations, and the upper echelon of Apple’s Mac Studio with M2 Ultra. Its defining advantage is the NVIDIA GeForce RTX 5090 with 32 GB of GDDR7 VRAM and a massive 1792 GB/s memory bandwidth—numbers that put it ahead of any consumer GPU currently available for local inference. Backed by a 16-core AMD Ryzen 9 9950X, 64 GB of DDR5-6000 RAM, and 6 TB of NVMe storage, this machine is ready to pull models from Hugging Face and start generating tokens out of the box.

AI Performance & Specifications

VRAM: 32 GB — The New Sweet Spot for Local LLMs

The RTX 5090’s 32 GB VRAM is the headline spec. It enables you to load 70B-parameter models at Q4 quantization comfortably, with headroom for context windows up to 128K tokens. For smaller models, you can run multiple instances for agentic workflows or batch inference. The 32 GB buffer also handles multimodal models (e.g., LLaVA, Qwen-VL) without swapping to system RAM.

Memory Bandwidth: 1792 GB/s

Token generation speed is bottlenecked by memory bandwidth, not compute. At 1792 GB/s, the RTX 5090 delivers approximately 1.8x the bandwidth of an RTX 4090 (1008 GB/s). In practice, this translates to roughly 80–120 tokens per second for a 7B model at Q4, and 20–40 tokens per second for a 70B model at Q4. These are real-time, interactive speeds—not batch throughput.

Compute: 105 TFLOPS (FP16)

For inference, FP16 TFLOPS matter less than bandwidth, but they become relevant when you run speculative decoding or need to process long context with flash attention. 105 TFLOPS is more than adequate for all current open-weight models. For fine-tuning small adapters (LoRA/QLoRA), this GPU can handle modest training workloads, though it’s not a training-first machine.

Power and Cooling

The system draws up to 750 W under full GPU load, supplied by a 1200 W Corsair RM1200x SHIFT PSU. The Corsair iCUE LINK TITAN 360 AIO keeps the Ryzen 9 9950X cool during sustained all-core workloads. The chassis is a Sliger M-CLASS full-tower (7.6" x 18.3" x 18.3") with support for up to four GPUs—meaning you could add three more RTX 5090s for 128 GB total VRAM, scaling to 300B+ models.

Specs at a Glance

Component	Detail
GPU	NVIDIA GeForce RTX 5090 32 GB GDDR7
VRAM	32 GB
Memory Bandwidth	1792 GB/s
FP16 Performance	105 TFLOPS
CPU	AMD Ryzen 9 9950X (16C/32T, up to 5.7 GHz)
RAM	64 GB DDR5-6000 (2x32 GB Corsair Vengeance)
Storage	2 TB PCIe 5.0 (OS) + 4 TB PCIe 4.0 (Data)
PSU	1200 W 80+ Gold
Cooling	360 mm AIO (CPU)
GPU Expansion	Up to 4 GPUs (additional cards not included)
Warranty	Lifetime labor, 2-year parts

What Models Can It Run?

This is where the M-CLASS v2 earns its keep. Here’s a breakdown by model size and quantization.

7B–8B Models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B)

Quantization: Q4_K_M or Q5_K_M
VRAM usage: ~5–6 GB
Expected tokens/sec: 80–120 (Q4), 60–90 (Q5)
Notes: You can run four of these simultaneously for multi-agent setups. Context windows up to 128K are comfortable.

13B–14B Models (Mixtral 8x7B, Llama 2 13B)

Quantization: Q4_K_M
VRAM usage: ~10–12 GB
Tokens/sec: 50–70
Notes: Mixtral’s MoE architecture fits easily; you can run it with full 32K context.

32B Models (Qwen 2.5 32B, Yi 1.5 34B)

Quantization: Q5_K_M (sweet spot)
VRAM usage: ~24–28 GB
Tokens/sec: 25–35
Notes: This is the best quality-to-speed tradeoff on this hardware. You get near-lossless quality with interactive speeds. At Q4, VRAM drops to ~20 GB, freeing headroom for longer context.

70B Models (Llama 3.1 70B, DeepSeek-R1 70B, Qwen 2.5 72B)

Quantization: Q4_K_M
VRAM usage: ~30–32 GB (tight, but fits)
Tokens/sec: 15–25
Notes: You can run the full 70B model with a 32K context window. For 128K contexts, you’ll need to drop to Q3_K_M or use Flash Attention. DeepSeek-R1’s chain-of-thought runs well, but expect 10–15 tokens/sec with reasoning tokens.

120B–180B Models (Mixtral 8x22B, Qwen 1.5 110B)

Quantization: Q3_K_M or Q2_K
VRAM usage: ~30–35 GB (with offloading to system RAM)
Tokens/sec: 5–10 (with partial CPU offload)
Notes: These are possible but not optimal. The M-CLASS v2 is best suited for models up to 70B at Q4. For 180B models, you’d want a dual-GPU configuration.

Multimodal and Long-Context

The 32 GB VRAM handles LLaVA-NeXT-34B (Q4, ~22 GB) with room for image inputs. For Qwen2-VL-72B, you’ll need Q3 or lower. Long-context tasks (e.g., 128K tokens with Llama 3.1 70B) are viable with Flash Attention v2 and the high bandwidth.

Use Cases & Target Audience

AI Engineers and ML Researchers

If you’re iterating on prompts, testing RAG pipelines, or evaluating model behavior, the M-CLASS v2 eliminates cloud latency and API costs. You can swap between models in seconds, run ablation studies locally, and keep your data private.

Developers Building AI-Powered Applications

For teams building agentic workflows (e.g., LangChain, CrewAI, AutoGen), this machine can host multiple models simultaneously. Run a 7B agent for routing, a 32B for reasoning, and a vision model for image analysis—all on one GPU with CUDA graphs and continuous batching.

Hobbyists Running Local Chatbots

If you want a ChatGPT-level experience without subscriptions or data leaving your machine, the M-CLASS v2 delivers. You can run Llama 3.1 70B at Q4 with sub-second latency for single-turn responses.

Inference Servers for Small Teams

With the ability to add up to three more RTX 5090s, the M-CLASS v2 scales to a 128 GB VRAM inference server. That’s enough to serve a 70B model with multiple concurrent users using vLLM or llama.cpp’s server mode.

Training Considerations

This is not a training workstation. 32 GB VRAM is enough for QLoRA fine-tuning of 7B–13B models, but full fine-tuning of 70B is impractical. For training, look at multi-GPU configurations or cloud instances.

How It Compares

vs. Custom DIY Build with RTX 5090

Building your own RTX 5090 system would cost roughly $5,500–$6,000 in parts (if you can find the GPU at MSRP). The M-CLASS v2’s $6,379 price includes professional assembly, cable management, stress testing, and lifetime labor support. For practitioners who value time over a few hundred dollars, the pre-built warranty and support are worth the premium.

vs. Lambda Labs A10/A100 Workstations

Lambda Labs offers workstations with NVIDIA A10 (24 GB) or A100 (40 GB) GPUs starting around $8,000. While the A100 has 40 GB VRAM and higher FP16 compute (312 TFLOPS), its memory bandwidth is only 1555 GB/s (vs. 1792 on the RTX 5090). For inference, the RTX 5090 is often faster. The A100 pulls ahead for training. The M-CLASS v2 is the better value for pure inference and agentic workflows.

vs. Apple Mac Studio (M2 Ultra, 192 GB Unified Memory)

The Mac Studio with 192 GB unified memory can run 70B models at Q8 without offloading, but its memory bandwidth (800 GB/s) is less than half the RTX 5090’s. Token generation speeds are 2–3x slower. The M-CLASS v2 wins on raw throughput and compatibility with CUDA-based frameworks (vLLM, TensorRT-LLM, llama.cpp). The Mac Studio wins on memory capacity for very large models and power efficiency.

When to pick the M-CLASS v2: You need high tokens-per-second for interactive use, you rely on CUDA-accelerated libraries, and you want the option to scale to multiple GPUs later.

Compatible AI Models

Hide F tierOnly popular models

56 models


minimax-m2.5MiniMax	230B(10B active)	SS	63.5 tok/s	22.7 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	SS	126.9 tok/s	11.4 GB
Falcon 40B InstructTechnology Innovation Institute	40B	SS	59.2 tok/s	24.4 GB
Qwen3.5-9BAlibaba Cloud (Qwen)	9B	SS	58.7 tok/s	24.6 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	SS	131.0 tok/s	11.0 GB
Llama 3.1 8B InstructMeta	8B	SS	108.2 tok/s	13.3 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	SS	169.1 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	SS	169.1 tok/s	8.5 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	SS	170.4 tok/s	8.5 GB
Qwen3.5-122B-A10BAlibaba Cloud (Qwen)	122B(10B active)	SS	52.9 tok/s	27.3 GB
Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	267.8 tok/s	5.4 GB
Carnice-9b for Hermes agentkai-os	9B	SS	239.8 tok/s	6.0 GB
Llama 3 8B InstructMeta	8B	AA	254.7 tok/s	5.7 GB
Gemma 4 E4B ITGoogle	4B	AA	208.6 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	208.6 tok/s	6.9 GB
Mistral 7B InstructMistral AI	7B	AA	225.6 tok/s	6.4 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Llama 2 7B ChatMeta	7B	AA	301.2 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	389.0 tok/s	3.7 GB
Mistral Small 3 24BMistral AI	24B	FF	37.0 tok/s	39.0 GB
Qwen3.6-27BAlibaba Cloud	27B	FF	19.8 tok/s	72.8 GB
Gemma 3 27B ITGoogle	27B	FF	32.9 tok/s	43.8 GB
Qwen3.5-27BAlibaba Cloud (Qwen)	27B	FF	19.8 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	17.6 tok/s	82.0 GB
Qwen3-32BAlibaba Cloud (Qwen)	32.8B	FF	26.8 tok/s	53.9 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
LLaMA 65BMeta	65B	FF	36.7 tok/s	39.3 GB

Rows per page

Page 1 of 3

Origin PC M-CLASS v2

Mid-tower AI workstation with RTX 5090 32GB and Ryzen 9 9950X. Corsair-cooled, 6TB NVMe, ready for local inference out of the box. Supports up to 4 GPUs, backed by Origin PC lifetime support.

AI PCs & LaptopsIn Stock

Best for LLMsPremium / High-End

Buy on Amazon$6,379Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM32 GB

FP16105 TFLOPS

TDP750 W

Memory BW1792 GB/s

Max Params32B at Q5_K_M; 70B at Q4

Form FactorMid Tower

GPUNVIDIA GeForce RTX 5090 32GB GDDR7 (1792 GB/s)

CPUAMD Ryzen 9 9950X (16C/32T, 4.3 / 5.7 GHz boost)

MotherboardASUS ProArt X870E-Creator WiFi

Memory64GB DDR5-6000 (2x32GB Corsair Vengeance)

Storage2TB Corsair MP700 Elite PCIe 5.0 (OS) + 4TB Corsair MP600 Core XT PCIe 4.0 (Data)

Power SupplyCorsair RM1200x SHIFT (1200W, 80+ Gold)

CoolingCorsair iCUE LINK TITAN 360 RX RGB AIO

GPU ExpansionUp to 4 GPUs

WarrantyLifetime labor, 2-year parts replacement

Our Take

Best for: Comfortable home for 70B at Q4

Pair this withminimax-m2.5 (230B)Largest popular open model that fits at Q4 — needs roughly 22.7 GB on this 32 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

AI Performance & Specifications

VRAM: 32 GB — The New Sweet Spot for Local LLMs

Memory Bandwidth: 1792 GB/s

Compute: 105 TFLOPS (FP16)

Power and Cooling

Specs at a Glance

Component	Detail
GPU	NVIDIA GeForce RTX 5090 32 GB GDDR7
VRAM	32 GB
Memory Bandwidth	1792 GB/s
FP16 Performance	105 TFLOPS
CPU	AMD Ryzen 9 9950X (16C/32T, up to 5.7 GHz)
RAM	64 GB DDR5-6000 (2x32 GB Corsair Vengeance)
Storage	2 TB PCIe 5.0 (OS) + 4 TB PCIe 4.0 (Data)
PSU	1200 W 80+ Gold
Cooling	360 mm AIO (CPU)
GPU Expansion	Up to 4 GPUs (additional cards not included)
Warranty	Lifetime labor, 2-year parts

What Models Can It Run?

This is where the M-CLASS v2 earns its keep. Here’s a breakdown by model size and quantization.

7B–8B Models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B)

Quantization: Q4_K_M or Q5_K_M
VRAM usage: ~5–6 GB
Expected tokens/sec: 80–120 (Q4), 60–90 (Q5)
Notes: You can run four of these simultaneously for multi-agent setups. Context windows up to 128K are comfortable.

13B–14B Models (Mixtral 8x7B, Llama 2 13B)

Quantization: Q4_K_M
VRAM usage: ~10–12 GB
Tokens/sec: 50–70
Notes: Mixtral’s MoE architecture fits easily; you can run it with full 32K context.

32B Models (Qwen 2.5 32B, Yi 1.5 34B)

Quantization: Q5_K_M (sweet spot)
VRAM usage: ~24–28 GB
Tokens/sec: 25–35
Notes: This is the best quality-to-speed tradeoff on this hardware. You get near-lossless quality with interactive speeds. At Q4, VRAM drops to ~20 GB, freeing headroom for longer context.

70B Models (Llama 3.1 70B, DeepSeek-R1 70B, Qwen 2.5 72B)

Quantization: Q4_K_M
VRAM usage: ~30–32 GB (tight, but fits)
Tokens/sec: 15–25
Notes: You can run the full 70B model with a 32K context window. For 128K contexts, you’ll need to drop to Q3_K_M or use Flash Attention. DeepSeek-R1’s chain-of-thought runs well, but expect 10–15 tokens/sec with reasoning tokens.

120B–180B Models (Mixtral 8x22B, Qwen 1.5 110B)

Quantization: Q3_K_M or Q2_K
VRAM usage: ~30–35 GB (with offloading to system RAM)
Tokens/sec: 5–10 (with partial CPU offload)
Notes: These are possible but not optimal. The M-CLASS v2 is best suited for models up to 70B at Q4. For 180B models, you’d want a dual-GPU configuration.

Multimodal and Long-Context

Use Cases & Target Audience

AI Engineers and ML Researchers

Developers Building AI-Powered Applications

Hobbyists Running Local Chatbots

If you want a ChatGPT-level experience without subscriptions or data leaving your machine, the M-CLASS v2 delivers. You can run Llama 3.1 70B at Q4 with sub-second latency for single-turn responses.

Inference Servers for Small Teams

Training Considerations

How It Compares

vs. Custom DIY Build with RTX 5090

vs. Lambda Labs A10/A100 Workstations

vs. Apple Mac Studio (M2 Ultra, 192 GB Unified Memory)

When to pick the M-CLASS v2: You need high tokens-per-second for interactive use, you rely on CUDA-accelerated libraries, and you want the option to scale to multiple GPUs later.

Compatible AI Models

Hide F tierOnly popular models

56 models


minimax-m2.5MiniMax	230B(10B active)	SS	63.5 tok/s	22.7 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	SS	126.9 tok/s	11.4 GB
Falcon 40B InstructTechnology Innovation Institute	40B	SS	59.2 tok/s	24.4 GB
Qwen3.5-9BAlibaba Cloud (Qwen)	9B	SS	58.7 tok/s	24.6 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	SS	131.0 tok/s	11.0 GB
Llama 3.1 8B InstructMeta	8B	SS	108.2 tok/s	13.3 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	SS	169.1 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	SS	169.1 tok/s	8.5 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	SS	170.4 tok/s	8.5 GB
Qwen3.5-122B-A10BAlibaba Cloud (Qwen)	122B(10B active)	SS	52.9 tok/s	27.3 GB
Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	267.8 tok/s	5.4 GB
Carnice-9b for Hermes agentkai-os	9B	SS	239.8 tok/s	6.0 GB
Llama 3 8B InstructMeta	8B	AA	254.7 tok/s	5.7 GB
Gemma 4 E4B ITGoogle	4B	AA	208.6 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	208.6 tok/s	6.9 GB
Mistral 7B InstructMistral AI	7B	AA	225.6 tok/s	6.4 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Llama 2 7B ChatMeta	7B	AA	301.2 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	389.0 tok/s	3.7 GB
Mistral Small 3 24BMistral AI	24B	FF	37.0 tok/s	39.0 GB
Qwen3.6-27BAlibaba Cloud	27B	FF	19.8 tok/s	72.8 GB
Gemma 3 27B ITGoogle	27B	FF	32.9 tok/s	43.8 GB
Qwen3.5-27BAlibaba Cloud (Qwen)	27B	FF	19.8 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	17.6 tok/s	82.0 GB
Qwen3-32BAlibaba Cloud (Qwen)	32.8B	FF	26.8 tok/s	53.9 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
LLaMA 65BMeta	65B	FF	36.7 tok/s	39.3 GB

Rows per page

Page 1 of 3

Origin PC M-CLASS v2

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM: 32 GB — The New Sweet Spot for Local LLMs

Memory Bandwidth: 1792 GB/s

Compute: 105 TFLOPS (FP16)

Power and Cooling

Specs at a Glance

What Models Can It Run?

7B–8B Models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B)

13B–14B Models (Mixtral 8x7B, Llama 2 13B)

32B Models (Qwen 2.5 32B, Yi 1.5 34B)

70B Models (Llama 3.1 70B, DeepSeek-R1 70B, Qwen 2.5 72B)

120B–180B Models (Mixtral 8x22B, Qwen 1.5 110B)

Multimodal and Long-Context

Use Cases & Target Audience

AI Engineers and ML Researchers

Developers Building AI-Powered Applications

Hobbyists Running Local Chatbots

Inference Servers for Small Teams

Training Considerations

How It Compares

vs. Custom DIY Build with RTX 5090

vs. Lambda Labs A10/A100 Workstations

vs. Apple Mac Studio (M2 Ultra, 192 GB Unified Memory)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)

Origin PC M-CLASS v2

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM: 32 GB — The New Sweet Spot for Local LLMs

Memory Bandwidth: 1792 GB/s

Compute: 105 TFLOPS (FP16)

Power and Cooling

Specs at a Glance

What Models Can It Run?

7B–8B Models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B)

13B–14B Models (Mixtral 8x7B, Llama 2 13B)

32B Models (Qwen 2.5 32B, Yi 1.5 34B)

70B Models (Llama 3.1 70B, DeepSeek-R1 70B, Qwen 2.5 72B)

120B–180B Models (Mixtral 8x22B, Qwen 1.5 110B)

Multimodal and Long-Context

Use Cases & Target Audience

AI Engineers and ML Researchers

Developers Building AI-Powered Applications

Hobbyists Running Local Chatbots

Inference Servers for Small Teams

Training Considerations

How It Compares

vs. Custom DIY Build with RTX 5090

vs. Lambda Labs A10/A100 Workstations

vs. Apple Mac Studio (M2 Ultra, 192 GB Unified Memory)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)