Full-tower AI workstation with RTX 6000 Ada 48GB and Threadripper PRO 7995WX 96-core. 256GB ECC DDR5 and 12TB NVMe. Enterprise-grade for heavy training and inference, supports up to 4 GPUs.
The first tier where 70B-class models stop feeling cramped. Headroom for KV cache means 32K+ context on Q4 quants without falling off the GPU. High TDP — plan for adequate cooling and a beefy PSU; not the right pick for compact desktops.
Generated from this product’s spec sheet. Editor reviews refine it over time.
The Origin PC L-CLASS v2 is a full-tower AI workstation built for practitioners who need to run large language models locally without compromise. This is not a consumer desktop or a prosumer rig—it’s an enterprise-grade machine configured with an NVIDIA RTX 6000 Ada (48GB VRAM) and an AMD Ryzen Threadripper PRO 7995WX (96 cores, 192 threads). At a $33,072 MSRP, it competes directly with pre-built data center workstations from Dell (Precision 7960 Tower) and custom builds targeting heavy inference and light-to-moderate fine-tuning.
What sets the L-CLASS v2 apart is its balance of GPU compute, CPU throughput, and memory bandwidth in a single, supported chassis. It ships with 256GB of ECC DDR5, 12TB of NVMe storage (split across PCIe 5.0 and 4.0), and a 1500W Platinum-rated power supply. For teams that need to run 70B parameter models at native FP8 precision or quantized 120B models with extended context, this machine handles it out of the box.
The specs that drive AI inference are clear-cut. Here’s what matters:
48GB of VRAM is the current sweet spot for running large open-weight models locally. It fits a 70B parameter model at FP8 natively—no offloading, no sharding across GPUs. That means you get full attention layers in GPU memory, which translates to faster token generation and lower latency. For quantized models, the same 48GB accommodates 120B parameters at Q4 with room for extended context (32K+ tokens).
At 960 GB/s, the RTX 6000 Ada’s memory bandwidth is a key factor in token generation speed. For a 70B model at FP8, you can expect roughly 30–40 tokens per second on single-batch inference—fast enough for interactive use. With batch processing (e.g., for a local inference server), throughput scales linearly with batch size until you hit compute limits.
91.1 TFLOPS at FP16 is sufficient for light fine-tuning and LoRA training on 7B–13B models. For full-parameter training on larger models, you’d want multiple GPUs. The L-CLASS v2 supports up to 4 GPUs, which makes it a viable platform for distributed training on models up to 30B parameters with FSDP or DeepSpeed.
The system draws up to 700W under full GPU load, with the Threadripper PRO adding another 350W. The 1500W PSU provides headroom for adding a second or third GPU. Cooling is handled by a SilverStone XE360-TR5 AIO for the CPU and the RTX 6000 Ada’s blower-style cooler. The chassis supports up to 12 fans or dual 360mm radiators, so thermal throttling is unlikely under sustained inference loads.
This machine is built for models that require high VRAM and memory bandwidth. Here’s a breakdown by model family and quantization:
The sweet spot for quality-to-speed on this hardware is FP8 for 70B models and Q4 for 120B models. For most production use cases—chatbots, RAG pipelines, agentic workflows—FP8 gives you full model quality without sacrificing latency.
If you’re deploying a local inference server for a team of 5–10 developers, the L-CLASS v2 can handle concurrent requests with batching. The Threadripper PRO’s 96 cores handle preprocessing, tokenization, and post-processing without creating a CPU bottleneck.
For small-scale production inference (e.g., a customer-facing chatbot with moderate traffic), this machine can serve a single 70B model at FP8 with sub-100ms latency per request under low concurrency. With 4 GPUs, you can serve multiple model instances or shard a 120B model across cards.
Light fine-tuning (LoRA, QLoRA) on 7B–13B models is straightforward. For full-parameter fine-tuning on 30B models, you’ll want at least 2 GPUs. The 256GB system memory ensures you can load large datasets without swapping.
If you have the budget, this is the machine for running uncensored models at home. No cloud costs, no API limits, no data leaving your network.
This system is optimized for inference-first workloads. The single RTX 6000 Ada is not ideal for training large models from scratch—you’d want 4–8 GPUs for that. But for inference, fine-tuning, and agentic workflows, it’s one of the most capable single-GPU workstations available.
The Mac Studio offers more unified memory (192GB) at a lower price point (~$8,000), which lets it run larger quantized models (e.g., 120B at Q4 with more context). However, the RTX 6000 Ada’s 960 GB/s memory bandwidth significantly outperforms the M3 Ultra’s ~800 GB/s, resulting in faster token generation for models that fit in 48GB. The L-CLASS v2 also supports up to 4 GPUs for scaling, while the Mac Studio is locked at one SoC.
Pick the L-CLASS v2 when: You need maximum inference speed for models up to 70B at FP8, or you plan to add GPUs later. Pick the Mac Studio when: You need to run larger quantized models (120B+) or prioritize unified memory over raw bandwidth.
A custom build with the same components could save you 10–15% on cost, but you lose the lifetime labor warranty and 2-year parts replacement. The L-CLASS v2 also includes a pre-validated cooling solution and a chassis designed for workstation airflow. For a team that can’t afford downtime, the warranty and support justify the premium.
Pick the L-CLASS v2 when: You need a supported, turnkey system with a single point of contact for hardware issues. Pick a custom build when: You’re comfortable managing your own hardware and want to save $3,000–$5,000.
minimax-m2.5MiniMax | 230B(10B active) | SS | 34.0 tok/s | 22.7 GB | |
Mixtral 8x7B InstructMistral AI | 46.7B(12.9B active) | SS | 68.0 tok/s | 11.4 GB | |
Gemma 4 26B-A4B ITGoogle | 26B(4B active) | SS | 70.2 tok/s | 11.0 GB | |
Qwen3.6 35B-A3BAlibaba | 35B(3B active) | SS | 90.6 tok/s | 8.5 GB | |
Qwen3.5-35B-A3BAlibaba | 35B(3B active) | SS | 90.6 tok/s | 8.5 GB | |
Qwen3.5-122B-A10BAlibaba | 122B(10B active) | SS | 28.3 tok/s | 27.3 GB | |
| 8B | SS | 58.0 tok/s | 13.3 GB | ||
Qwen3-30B-A3BAlibaba | 30B(3B active) | SS | 143.5 tok/s | 5.4 GB | |
| Ad | |||||
Llama 2 13B ChatMeta | 13B | SS | 91.3 tok/s | 8.5 GB | |
Falcon 40B InstructTechnology Innovation Institute | 40B | SS | 31.7 tok/s | 24.4 GB | |
Qwen3.5-9BAlibaba | 9B | SS | 31.4 tok/s | 24.6 GB | |
| 9B | AA | 128.5 tok/s | 6.0 GB | ||
| 8B | AA | 136.4 tok/s | 5.7 GB | ||
Qwen3-235B-A22BAlibaba | 235B(22B active) | AA | 21.3 tok/s | 36.3 GB | |
Gemma 4 E4B ITGoogle | 4B | AA | 111.7 tok/s | 6.9 GB | |
Gemma 3 4B ITGoogle | 4B | AA | 111.7 tok/s | 6.9 GB | |
| Ad | |||||
Mistral 7B InstructMistral AI | 7B | AA | 120.8 tok/s | 6.4 GB | |
Llama 2 7B ChatMeta | 7B | AA | 161.4 tok/s | 4.8 GB | |
Gemma 4 E2B ITGoogle | 2B | AA | 208.4 tok/s | 3.7 GB | |
Mistral Small 3 24BMistral AI | 24B | BB | 19.8 tok/s | 39.0 GB | |
LLaMA 65BMeta | 65B | BB | 19.7 tok/s | 39.3 GB | |
Llama 2 70B ChatMeta | 70B | BB | 17.8 tok/s | 43.4 GB | |
Mixtral 8x22B InstructMistral AI | 141B(39B active) | BB | 17.7 tok/s | 43.6 GB | |
Qwen 3.5 OmniAlibaba | 397B(17B active) | BB | 17.1 tok/s | 45.2 GB | |
| Ad | |||||
| 70B | BB | 16.9 tok/s | 45.7 GB | ||