No image

Origin PC

Origin PC L-CLASS v2

Name: Origin PC L-CLASS v2
Brand: Origin PC
Price: 33072 USD
Availability: InStock

Full-tower AI workstation with RTX 6000 Ada 48GB and Threadripper PRO 7995WX 96-core. 256GB ECC DDR5 and 12TB NVMe. Enterprise-grade for heavy training and inference, supports up to 4 GPUs.

AI PCs & LaptopsIn Stock

Best for LLMsPremium / High-EndData Center

Buy on Amazon$33,072Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM48 GB

FP1691.1 TFLOPS

TDP700 W

Memory BW960 GB/s

Max Params70B at FP8 native; 120B at Q4 with extended context

Form FactorFull Tower

GPUNVIDIA RTX 6000 Ada Generation 48GB GDDR6 ECC (960 GB/s)

CPUAMD Ryzen Threadripper PRO 7995WX (96C/192T, 350W)

MotherboardASUS Pro WS WRX90E-SAGE SE

Memory256GB DDR5-4800 ECC Registered (4x64GB)

Storage4TB Corsair MP700 PRO PCIe 5.0 (OS) + 8TB Corsair MP600 PRO XT PCIe 4.0 (Data)

Power SupplyCorsair HX1500i (1500W, 80+ Platinum)

CoolingSilverStone XE360-TR5 AIO

GPU ExpansionUp to 4 GPUs

WarrantyLifetime labor, 2-year parts replacement

Our Take

Best for: Workstation-class serving of 70B at Q5/Q6 with long context

The first tier where 70B-class models stop feeling cramped. Headroom for KV cache means 32K+ context on Q4 quants without falling off the GPU. High TDP — plan for adequate cooling and a beefy PSU; not the right pick for compact desktops.

Pair this withQwen3-235B-A22B (235B)Largest popular open model that fits at Q4 — needs roughly 36.3 GB on this 48 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

The Origin PC L-CLASS v2 is a full-tower AI workstation built for practitioners who need to run large language models locally without compromise. This is not a consumer desktop or a prosumer rig—it’s an enterprise-grade machine configured with an NVIDIA RTX 6000 Ada (48GB VRAM) and an AMD Ryzen Threadripper PRO 7995WX (96 cores, 192 threads). At a $33,072 MSRP, it competes directly with pre-built data center workstations from Dell (Precision 7960 Tower) and custom builds targeting heavy inference and light-to-moderate fine-tuning.

What sets the L-CLASS v2 apart is its balance of GPU compute, CPU throughput, and memory bandwidth in a single, supported chassis. It ships with 256GB of ECC DDR5, 12TB of NVMe storage (split across PCIe 5.0 and 4.0), and a 1500W Platinum-rated power supply. For teams that need to run 70B parameter models at native FP8 precision or quantized 120B models with extended context, this machine handles it out of the box.

AI Performance & Specifications

The specs that drive AI inference are clear-cut. Here’s what matters:

VRAM: 48 GB GDDR6 ECC on the RTX 6000 Ada
Memory Bandwidth: 960 GB/s
FP16 Performance: 91.1 TFLOPS
TDP: 700 W (GPU only)
System Memory: 256 GB DDR5-4800 ECC Registered
GPU Expansion: Up to 4 GPUs (supports additional RTX 6000 Ada or similar)

VRAM and Model Capacity

48GB of VRAM is the current sweet spot for running large open-weight models locally. It fits a 70B parameter model at FP8 natively—no offloading, no sharding across GPUs. That means you get full attention layers in GPU memory, which translates to faster token generation and lower latency. For quantized models, the same 48GB accommodates 120B parameters at Q4 with room for extended context (32K+ tokens).

Memory Bandwidth and Token Speed

At 960 GB/s, the RTX 6000 Ada’s memory bandwidth is a key factor in token generation speed. For a 70B model at FP8, you can expect roughly 30–40 tokens per second on single-batch inference—fast enough for interactive use. With batch processing (e.g., for a local inference server), throughput scales linearly with batch size until you hit compute limits.

Compute Throughput

91.1 TFLOPS at FP16 is sufficient for light fine-tuning and LoRA training on 7B–13B models. For full-parameter training on larger models, you’d want multiple GPUs. The L-CLASS v2 supports up to 4 GPUs, which makes it a viable platform for distributed training on models up to 30B parameters with FSDP or DeepSpeed.

Power and Cooling

The system draws up to 700W under full GPU load, with the Threadripper PRO adding another 350W. The 1500W PSU provides headroom for adding a second or third GPU. Cooling is handled by a SilverStone XE360-TR5 AIO for the CPU and the RTX 6000 Ada’s blower-style cooler. The chassis supports up to 12 fans or dual 360mm radiators, so thermal throttling is unlikely under sustained inference loads.

What Models Can It Run?

This machine is built for models that require high VRAM and memory bandwidth. Here’s a breakdown by model family and quantization:

Llama 3.1 70B

FP8: Fits entirely in VRAM. Expect ~35 tokens/second at 4K context.
Q4_K_M: Fits with overhead for 32K context. Slightly faster (40+ tokens/second) but with minor quality loss.

DeepSeek-R1 70B

FP8: Full precision, no offloading. ~30 tokens/second. Suitable for chain-of-thought reasoning tasks.
Q4: Fits comfortably. Good for agentic workflows where latency matters.

Qwen 2.5 72B

FP8: Fits natively. ~35 tokens/second. Handles long-context (128K) with KV cache quantization.
Q4: Allows extended context up to 128K without VRAM pressure.

Mistral Large 2 (123B)

Q4: Fits in 48GB with 8K context. ~25 tokens/second. For longer contexts, you’ll need to offload some layers or use a second GPU.

Mixtral 8x22B

FP8: Fits easily. ~45 tokens/second. This model is compute-bound on a single GPU, so memory bandwidth is less of a bottleneck.

Multimodal Models (LLaVA-NeXT, Qwen-VL)

FP8: Fits with visual encoder overhead. Expect 25–30 tokens/second for text generation after image processing.

The sweet spot for quality-to-speed on this hardware is FP8 for 70B models and Q4 for 120B models. For most production use cases—chatbots, RAG pipelines, agentic workflows—FP8 gives you full model quality without sacrificing latency.

Use Cases & Target Audience

Developers Building AI-Powered Applications

If you’re deploying a local inference server for a team of 5–10 developers, the L-CLASS v2 can handle concurrent requests with batching. The Threadripper PRO’s 96 cores handle preprocessing, tokenization, and post-processing without creating a CPU bottleneck.

Teams Running Inference Servers

For small-scale production inference (e.g., a customer-facing chatbot with moderate traffic), this machine can serve a single 70B model at FP8 with sub-100ms latency per request under low concurrency. With 4 GPUs, you can serve multiple model instances or shard a 120B model across cards.

Researchers Fine-Tuning Open Models

Light fine-tuning (LoRA, QLoRA) on 7B–13B models is straightforward. For full-parameter fine-tuning on 30B models, you’ll want at least 2 GPUs. The 256GB system memory ensures you can load large datasets without swapping.

Hobbyists Running Local Chatbots

If you have the budget, this is the machine for running uncensored models at home. No cloud costs, no API limits, no data leaving your network.

Training vs. Inference

This system is optimized for inference-first workloads. The single RTX 6000 Ada is not ideal for training large models from scratch—you’d want 4–8 GPUs for that. But for inference, fine-tuning, and agentic workflows, it’s one of the most capable single-GPU workstations available.

How It Compares

vs. Apple Mac Studio (M3 Ultra, 192GB Unified Memory)

The Mac Studio offers more unified memory (192GB) at a lower price point (~$8,000), which lets it run larger quantized models (e.g., 120B at Q4 with more context). However, the RTX 6000 Ada’s 960 GB/s memory bandwidth significantly outperforms the M3 Ultra’s ~800 GB/s, resulting in faster token generation for models that fit in 48GB. The L-CLASS v2 also supports up to 4 GPUs for scaling, while the Mac Studio is locked at one SoC.

Pick the L-CLASS v2 when: You need maximum inference speed for models up to 70B at FP8, or you plan to add GPUs later. Pick the Mac Studio when: You need to run larger quantized models (120B+) or prioritize unified memory over raw bandwidth.

vs. Custom Build (Threadripper + RTX 6000 Ada)

A custom build with the same components could save you 10–15% on cost, but you lose the lifetime labor warranty and 2-year parts replacement. The L-CLASS v2 also includes a pre-validated cooling solution and a chassis designed for workstation airflow. For a team that can’t afford downtime, the warranty and support justify the premium.

Pick the L-CLASS v2 when: You need a supported, turnkey system with a single point of contact for hardware issues. Pick a custom build when: You’re comfortable managing your own hardware and want to save $3,000–$5,000.

Compatible AI Models

Hide F tierOnly popular models

56 models


minimax-m2.5MiniMax	230B(10B active)	SS	34.0 tok/s	22.7 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	SS	68.0 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	SS	70.2 tok/s	11.0 GB
Qwen3.6 35B-A3BAlibaba	35B(3B active)	SS	90.6 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba	35B(3B active)	SS	90.6 tok/s	8.5 GB
Qwen3.5-122B-A10BAlibaba	122B(10B active)	SS	28.3 tok/s	27.3 GB
Llama 3.1 8B InstructMeta	8B	SS	58.0 tok/s	13.3 GB
Qwen3-30B-A3BAlibaba	30B(3B active)	SS	143.5 tok/s	5.4 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	SS	91.3 tok/s	8.5 GB
Falcon 40B InstructTechnology Innovation Institute	40B	SS	31.7 tok/s	24.4 GB
Qwen3.5-9BAlibaba	9B	SS	31.4 tok/s	24.6 GB
Carnice-9b for Hermes agentkai-os	9B	AA	128.5 tok/s	6.0 GB
Llama 3 8B InstructMeta	8B	AA	136.4 tok/s	5.7 GB
Qwen3-235B-A22BAlibaba	235B(22B active)	AA	21.3 tok/s	36.3 GB
Gemma 4 E4B ITGoogle	4B	AA	111.7 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	111.7 tok/s	6.9 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Mistral 7B InstructMistral AI	7B	AA	120.8 tok/s	6.4 GB
Llama 2 7B ChatMeta	7B	AA	161.4 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	208.4 tok/s	3.7 GB
Mistral Small 3 24BMistral AI	24B	BB	19.8 tok/s	39.0 GB
LLaMA 65BMeta	65B	BB	19.7 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	BB	17.8 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	BB	17.7 tok/s	43.6 GB
Qwen 3.5 OmniAlibaba	397B(17B active)	BB	17.1 tok/s	45.2 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	BB	16.9 tok/s	45.7 GB

Rows per page

Page 1 of 3

Origin PC L-CLASS v2

Full-tower AI workstation with RTX 6000 Ada 48GB and Threadripper PRO 7995WX 96-core. 256GB ECC DDR5 and 12TB NVMe. Enterprise-grade for heavy training and inference, supports up to 4 GPUs.

AI PCs & LaptopsIn Stock

Best for LLMsPremium / High-EndData Center

Buy on Amazon$33,072Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM48 GB

FP1691.1 TFLOPS

TDP700 W

Memory BW960 GB/s

Max Params70B at FP8 native; 120B at Q4 with extended context

Form FactorFull Tower

GPUNVIDIA RTX 6000 Ada Generation 48GB GDDR6 ECC (960 GB/s)

CPUAMD Ryzen Threadripper PRO 7995WX (96C/192T, 350W)

MotherboardASUS Pro WS WRX90E-SAGE SE

Memory256GB DDR5-4800 ECC Registered (4x64GB)

Storage4TB Corsair MP700 PRO PCIe 5.0 (OS) + 8TB Corsair MP600 PRO XT PCIe 4.0 (Data)

Power SupplyCorsair HX1500i (1500W, 80+ Platinum)

CoolingSilverStone XE360-TR5 AIO

GPU ExpansionUp to 4 GPUs

WarrantyLifetime labor, 2-year parts replacement

Our Take

Best for: Workstation-class serving of 70B at Q5/Q6 with long context

Pair this withQwen3-235B-A22B (235B)Largest popular open model that fits at Q4 — needs roughly 36.3 GB on this 48 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

Overview

AI Performance & Specifications

The specs that drive AI inference are clear-cut. Here’s what matters:

VRAM: 48 GB GDDR6 ECC on the RTX 6000 Ada
Memory Bandwidth: 960 GB/s
FP16 Performance: 91.1 TFLOPS
TDP: 700 W (GPU only)
System Memory: 256 GB DDR5-4800 ECC Registered
GPU Expansion: Up to 4 GPUs (supports additional RTX 6000 Ada or similar)

VRAM and Model Capacity

Memory Bandwidth and Token Speed

Compute Throughput

Power and Cooling

What Models Can It Run?

This machine is built for models that require high VRAM and memory bandwidth. Here’s a breakdown by model family and quantization:

Llama 3.1 70B

FP8: Fits entirely in VRAM. Expect ~35 tokens/second at 4K context.
Q4_K_M: Fits with overhead for 32K context. Slightly faster (40+ tokens/second) but with minor quality loss.

DeepSeek-R1 70B

FP8: Full precision, no offloading. ~30 tokens/second. Suitable for chain-of-thought reasoning tasks.
Q4: Fits comfortably. Good for agentic workflows where latency matters.

Qwen 2.5 72B

FP8: Fits natively. ~35 tokens/second. Handles long-context (128K) with KV cache quantization.
Q4: Allows extended context up to 128K without VRAM pressure.

Mistral Large 2 (123B)

Q4: Fits in 48GB with 8K context. ~25 tokens/second. For longer contexts, you’ll need to offload some layers or use a second GPU.

Mixtral 8x22B

FP8: Fits easily. ~45 tokens/second. This model is compute-bound on a single GPU, so memory bandwidth is less of a bottleneck.

Multimodal Models (LLaVA-NeXT, Qwen-VL)

FP8: Fits with visual encoder overhead. Expect 25–30 tokens/second for text generation after image processing.

Use Cases & Target Audience

Developers Building AI-Powered Applications

Teams Running Inference Servers

Researchers Fine-Tuning Open Models

Hobbyists Running Local Chatbots

If you have the budget, this is the machine for running uncensored models at home. No cloud costs, no API limits, no data leaving your network.

Training vs. Inference

How It Compares

vs. Apple Mac Studio (M3 Ultra, 192GB Unified Memory)

vs. Custom Build (Threadripper + RTX 6000 Ada)

Compatible AI Models

Hide F tierOnly popular models

56 models


minimax-m2.5MiniMax	230B(10B active)	SS	34.0 tok/s	22.7 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	SS	68.0 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	SS	70.2 tok/s	11.0 GB
Qwen3.6 35B-A3BAlibaba	35B(3B active)	SS	90.6 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba	35B(3B active)	SS	90.6 tok/s	8.5 GB
Qwen3.5-122B-A10BAlibaba	122B(10B active)	SS	28.3 tok/s	27.3 GB
Llama 3.1 8B InstructMeta	8B	SS	58.0 tok/s	13.3 GB
Qwen3-30B-A3BAlibaba	30B(3B active)	SS	143.5 tok/s	5.4 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	SS	91.3 tok/s	8.5 GB
Falcon 40B InstructTechnology Innovation Institute	40B	SS	31.7 tok/s	24.4 GB
Qwen3.5-9BAlibaba	9B	SS	31.4 tok/s	24.6 GB
Carnice-9b for Hermes agentkai-os	9B	AA	128.5 tok/s	6.0 GB
Llama 3 8B InstructMeta	8B	AA	136.4 tok/s	5.7 GB
Qwen3-235B-A22BAlibaba	235B(22B active)	AA	21.3 tok/s	36.3 GB
Gemma 4 E4B ITGoogle	4B	AA	111.7 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	111.7 tok/s	6.9 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Mistral 7B InstructMistral AI	7B	AA	120.8 tok/s	6.4 GB
Llama 2 7B ChatMeta	7B	AA	161.4 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	208.4 tok/s	3.7 GB
Mistral Small 3 24BMistral AI	24B	BB	19.8 tok/s	39.0 GB
LLaMA 65BMeta	65B	BB	19.7 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	BB	17.8 tok/s	43.4 GB
Mixtral 8x22B InstructMistral AI	141B(39B active)	BB	17.7 tok/s	43.6 GB
Qwen 3.5 OmniAlibaba	397B(17B active)	BB	17.1 tok/s	45.2 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	BB	16.9 tok/s	45.7 GB

Rows per page

Page 1 of 3

Origin PC L-CLASS v2

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM and Model Capacity

Memory Bandwidth and Token Speed

Compute Throughput

Power and Cooling

What Models Can It Run?

Llama 3.1 70B

DeepSeek-R1 70B

Qwen 2.5 72B

Mistral Large 2 (123B)

Mixtral 8x22B

Multimodal Models (LLaVA-NeXT, Qwen-VL)

Use Cases & Target Audience

Developers Building AI-Powered Applications

Teams Running Inference Servers

Researchers Fine-Tuning Open Models

Hobbyists Running Local Chatbots

Training vs. Inference

How It Compares

vs. Apple Mac Studio (M3 Ultra, 192GB Unified Memory)

vs. Custom Build (Threadripper + RTX 6000 Ada)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)

Origin PC L-CLASS v2

Quick Specs

Our Take

Specifications

Overview

AI Performance & Specifications

VRAM and Model Capacity

Memory Bandwidth and Token Speed

Compute Throughput

Power and Cooling

What Models Can It Run?

Llama 3.1 70B

DeepSeek-R1 70B

Qwen 2.5 72B

Mistral Large 2 (123B)

Mixtral 8x22B

Multimodal Models (LLaVA-NeXT, Qwen-VL)

Use Cases & Target Audience

Developers Building AI-Powered Applications

Teams Running Inference Servers

Researchers Fine-Tuning Open Models

Hobbyists Running Local Chatbots

Training vs. Inference

How It Compares

vs. Apple Mac Studio (M3 Ultra, 192GB Unified Memory)

vs. Custom Build (Threadripper + RTX 6000 Ada)

Compatible AI Models

Similar Products

Reatan Mini Gaming PC (Ryzen AI 9 HX 470 with Speaker)

Reatan HTPC (Ryzen AI 9 HX 470 48GB)

Reatan X8 (Ryzen AI 9 HX 470 48GB)

NIMO Mini PC (Ryzen AI Max+ 395 128GB)