ASUS

ASUS Ascent GX10 - 4TB

Name: ASUS Ascent GX10 - 4TB
Brand: ASUS
Price: 3999 USD
Availability: InStock

A compact AI edge device featuring a 4TB PCIe Gen 5.0 SSD, highly optimized for thermal efficiency and dual-system stacking.

AI PCs & LaptopsIn Stock

Best for LLMsEdge AI

Buy on Amazon$3,999Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

Storage InterfacePCIe Gen 5.0 x4

Cooling SystemDual Vapor Chamber, 3 Fans

Dimensions150 x 150 x 51 mm

Specifications

The ASUS Ascent GX10 - 4TB is a high-density, small-form-factor AI workstation designed to bridge the gap between consumer desktops and enterprise-grade rack servers. Built around the NVIDIA Grace Blackwell (GB10) Superchip, this device integrates an ARM v9.2-A CPU with a Blackwell-architecture GPU. For AI engineers and researchers, the standout feature of this specific 4TB SKU is the inclusion of a PCIe Gen 5.0 x4 SSD, providing the massive I/O throughput necessary for loading large model weights into memory rapidly.

Positioned as a "Desktop AI Supercomputer," the GX10 competes directly with the Mac Studio (M2/M3 Ultra) and specialized edge AI boxes from vendors like GIGABYTE and Dell. However, the GX10 distinguishes itself through its thermal design—utilizing a dual vapor chamber and a three-fan array—and its focus on "dual-system stacking," allowing teams to scale local compute by physically stacking units with coherent interconnects.

AI Performance & Specifications

For AI workloads, the hardware's value is defined by its memory architecture and compute throughput. The GX10 manages 250 INT8 TOPS, making it a powerhouse for quantized inference at the edge.

Unified VRAM and Bandwidth: The system features 128GB of LPDDR5x unified memory. Unlike traditional PC builds where the GPU is bottlenecked by the PCIe bus, the NVLink-C2C interconnect between the CPU and Blackwell GPU allows for coherent memory access. With a memory bandwidth of 273 GB/s, it offers significantly higher throughput than standard DDR5 setups, though it sits below the bandwidth of high-end H100/B200 discrete cards.
Compute Throughput: At 29.71 TFLOPS of FP16 performance, the GX10 is optimized for FP16/BF16 inference and lightweight fine-tuning. The 250 TOPS of INT8 performance is particularly relevant for practitioners deploying production-grade models using 4-bit or 8-bit quantization.
Storage and I/O: The 4TB PCIe Gen 5.0 SSD is a critical component for developers working with massive datasets or swapping between multiple large model checkpoints. Integrated NVIDIA ConnectX-7 SmartNIC and 10G Ethernet ensure that data ingestion from a local cluster or NAS does not become a bottleneck.
Thermal Efficiency: Operating at a 140W TDP (with a 240W external power supply), the GX10 is significantly more power-efficient than a multi-GPU workstation, making it suitable for 24/7 "always-on" agentic workflows in environments with limited cooling or power budgets.

What Models Can It Run?

The 128GB VRAM capacity is the primary selling point for the ASUS Ascent GX10 - 4TB. This allows practitioners to run models that are typically reserved for data center hardware.

Large Language Models (LLMs)

200B+ Parameter Models: The GX10 is rated for models up to 200B parameters. You can run DeepSeek-V3 or Grok-1 at heavy quantization (IQ2_XXS or Q2_K) for research purposes.
Llama 3.1 70B & 405B: The 70B variant runs comfortably at FP16 or Q8_0 with massive context windows (up to 128k). The 405B variant can be run at 2-bit or 3-bit quantization (via GGUF/ExL2), though performance will be limited by the 273 GB/s bandwidth.
Mixtral 8x22B: This MoE (Mixture of Experts) model fits easily into the 128GB footprint at Q4_K_M or Q5_K_M, providing a "sweet spot" of high intelligence and acceptable token-per-second (TPS) rates.

Multimodal and Specialized Models

DeepSeek-R1: For reasoning tasks, the GX10 can host the full 671B MoE version only at extremely high quantization (which may degrade logic), but it is the ideal platform for the 32B and 70B distilled versions, allowing for massive KV cache overhead for long-chain reasoning.
Stable Diffusion / Flux.1: For image generation, the 128GB VRAM allows for training LoRAs or running Flux.1 (Dev/Schnell) with multiple control nets and ultra-high resolutions without OOM (Out of Memory) errors.

Expected Inference Speeds

While exact tokens per second depend on the quantization method (K-Quants vs. AWQ), users can expect:

Llama 3.1 8B (FP16): 100+ TPS.
Llama 3.1 70B (Q4_K_M): 15–22 TPS.
Mistral Large 2 (Q4_0): 12–18 TPS.

Use Cases & Target Audience

The ASUS Ascent GX10 - 4TB is not a gaming machine or a general-purpose office PC; it is a specialized tool for the following personas:

AI Agent Developers: The 128GB VRAM allows for hosting a primary LLM (like Llama 3 70B) alongside several smaller "expert" models and a vector database (Milvus/ChromaDB) all on a single, silent edge device.
ML Researchers (Fine-Tuning): While not a replacement for an H100 cluster, the GX10 is capable of Parameter-Efficient Fine-Tuning (PEFT) techniques like QLoRA for models up to 70B parameters, allowing researchers to iterate locally before deploying to the cloud.
Privacy-Conscious Enterprises: For companies handling sensitive data (legal, medical, or proprietary code), the GX10 provides enough VRAM to run high-quantization versions of the world's most powerful open-source models entirely offline.
Edge Deployment: Its compact 150 x 150 x 51 mm footprint and 10G LAN make it an ideal "on-prem" inference server for factories, hospitals, or retail environments requiring real-time multimodal AI analysis.

How It Compares

When evaluating the ASUS Ascent GX10 - 4TB, practitioners typically look at two main alternatives:

ASUS Ascent GX10 vs. Apple Mac Studio (M3 Ultra, 128GB RAM)

The Mac Studio is the GX10's closest competitor in the prosumer space.

Software Stack: The GX10 runs Ubuntu and NVIDIA DGX OS, providing native support for the CUDA ecosystem, TensorRT, and Triton Inference Server. The Mac Studio relies on Metal (MLX), which, while improving, often lags behind in supporting the latest optimization kernels (like FlashAttention-2 or specific quantization kernels).
Networking: The GX10’s integrated ConnectX-7 and 10G LAN give it a significant edge for clustered workloads compared to the Mac’s standard 10G port.
Storage: The GX10's PCIe Gen 5.0 storage is user-serviceable/replaceable, whereas Apple's storage is soldered and significantly more expensive to upgrade.

ASUS Ascent GX10 vs. Custom Multi-GPU (2x RTX 3090/4090)

VRAM: A dual 3090/4090 setup provides only 48GB of VRAM, which is insufficient for 70B models at high precision or 100B+ models at any precision. To match the GX10's 128GB capacity, you would need a 6-GPU array, which consumes >1500W and requires a server rack.
Form Factor: The GX10 provides 128GB of VRAM in a chassis that fits in a backpack. For mobile researchers or edge deployments, the power-to-size ratio of the Grace Blackwell superchip is unmatched by traditional x86/PCIe architectures.

The ASUS Ascent GX10 - 4TB is a purpose-built inference engine. For teams that need to run 70B+ parameter models locally with a native NVIDIA software stack, but cannot justify the cost or power requirements of an H100, the GX10 is the current benchmark for compact AI compute.

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6