ASUS

ASUS Ascent GX10 - 2TB

Name: ASUS Ascent GX10 - 2TB
Brand: ASUS
Price: 4006 USD
Availability: InStock

A mid-tier GB10 workstation balancing edge storage needs with affordability by utilizing a 2TB PCIe Gen 4.0 NVMe drive.

AI PCs & LaptopsIn Stock

Edge AIBest for LLMs

Buy on Amazon$4,006Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

Storage InterfacePCIe Gen 4.0 x4

Network Interface10G LAN, ConnectX-7, Wi-Fi 7, BT 5

Dimensions150 x 150 x 51 mm

Specifications

The ASUS Ascent GX10 - 2TB is a compact, high-density AI workstation built around the NVIDIA GB10 (Grace Blackwell) Superchip. By integrating an Arm v9.2-A CPU with Blackwell-based graphics and 128GB of unified LPDDR5x memory, ASUS has created a powerful edge inference node that fits within a 150 x 150 x 51 mm footprint. While the GX10 is available in multiple storage tiers, the 2TB version serves as the mid-tier sweet spot for practitioners who need a local cache for large model weights without the premium price of the 5.0 NVMe 4TB flagship.

This hardware occupies the professional "Edge AI" tier, bridging the gap between high-end consumer GPUs like the RTX 5090 and enterprise rackmount servers. It competes directly with the Apple Mac Studio (M2/M3 Ultra) and the GIGABYTE MS01, but offers a distinct advantage for those integrated into the NVIDIA/CUDA ecosystem. For AI engineers and teams building agentic workflows, the GX10 provides a stable, Linux-native environment (Ubuntu or NVIDIA DGX OS) optimized for the Blackwell architecture.

AI Performance & Specifications

The core value proposition of the ASUS Ascent GX10 is the NVIDIA GB10 Superchip, which utilizes NVLink-C2C to provide coherent memory access between the CPU and GPU. For AI workloads, this eliminates the PCIe bottleneck often found in traditional discrete GPU setups.

VRAM & Memory Architecture: With 128GB of unified LPDDR5x memory, the GX10 provides a massive pool for model weights. The 273 GB/s memory bandwidth is superior to standard consumer DDR5 systems, though lower than the HBM3e found in data center H100s. In local LLM contexts, this bandwidth directly dictates the tokens-per-second (t/s) during the generation phase.
Compute Throughput: The system delivers 29.71 TFLOPS of FP16 performance and a substantial 250 TOPS of INT8 performance. For practitioners utilizing the latest quantization techniques, the Blackwell architecture includes dedicated support for FP4, pushing AI compute potential toward the 1 petaFLOP mark for specific low-precision workloads.
Power Efficiency: At a TDP of 140W (powered by a 240W external adapter), the GX10 is exceptionally efficient. It delivers performance that would typically require a 450W+ discrete GPU, making it ideal for continuous "always-on" agentic deployments or edge environments where thermal management is a constraint.
Connectivity: The inclusion of an integrated NVIDIA ConnectX-7 SmartNIC and 10G LAN makes this a high-performance node for distributed inference or clusters where low-latency data transfer is critical.

What Models Can It Run?

The ASUS Ascent GX10 - 2TB is designed for large language models (LLMs) that exceed the capacity of standard 16GB or 24GB consumer cards. Its 128GB of VRAM allows for the local execution of models with up to 200B parameters.

Model Compatibility & Quantization

Llama 3.1 70B: Can be run entirely in memory at FP16 or BF16 with room to spare for massive KV caches (long context). At 4-bit (Q4_K_M) or 8-bit (Q8_0) quantization, the performance is exceptionally fluid.
DeepSeek-V3 / DeepSeek-R1: These MoE (Mixture of Experts) models are ideal for the GX10. While the full 671B models will require heavy quantization or offloading, the GX10 can comfortably host the 130B-180B parameter variants or heavily quantized versions of the larger models.
Qwen 2.5 72B & Mixtral 8x22B: These models sit in the "sweet spot" for this hardware. Users can expect high-speed inference even with long-context windows (up to 128k tokens), as the 128GB capacity prevents the system from spilling over into slower system RAM.
Vision & Multimodal: The GX10 handles models like Llava-v1.6 or CogVLM with ease, providing enough headroom to process high-resolution image embeddings alongside the text LLM.

Expected Performance

For a 70B parameter model at 4-bit quantization, the ASUS Ascent GX10 - 2TB AI inference performance typically yields 15–25 tokens per second, depending on the specific optimization (TensorRT-LLM vs. llama.cpp). This is well above the reading speed of a human and sufficient for real-time agentic tool-use.

Use Cases & Target Audience

Local AI Agents & Orchestration

For developers building agentic workflows (AutoGPT, CrewAI, or LangGraph), the GX10 serves as a reliable "brain." Local deployment ensures that sensitive API keys, proprietary data, and internal tool-use logs never leave the local network. The 128GB VRAM is particularly useful when running multiple specialized models simultaneously (e.g., a primary reasoning model alongside a smaller embedding model and a vision model).

Edge AI & Industrial Deployment

The rugged, compact chassis and 10G networking make the GX10 a candidate for Edge AI. It can be deployed in retail, manufacturing, or healthcare settings to process telemetry or video feeds locally. The ConnectX-7 SmartNIC ensures that data ingestion doesn't bottleneck the Blackwell GPU's processing power.

ML Research & Development

Researchers can use the GX10 for fine-tuning smaller models (7B to 30B) using techniques like QLoRA. While not a replacement for an H100 cluster for pre-training, it is an excellent "dev-box" for prototyping models that will eventually be deployed to the cloud.

How It Compares

When evaluating the best hardware for local AI agents 2026, the GX10 is often compared to the following:

ASUS Ascent GX10 - 2TB vs. Apple Mac Studio (M2/M3 Ultra, 128GB): The Mac Studio offers higher memory bandwidth (up to 800 GB/s), which can lead to faster token generation. However, the GX10 utilizes the NVIDIA Blackwell architecture, providing native CUDA support. For developers relying on TensorRT-LLM, Triton, or specific NVIDIA-only kernels, the GX10 is the superior choice.
ASUS Ascent GX10 - 2TB vs. Multi-GPU PC (e.g., 2x RTX 5090): A dual-GPU PC provides more raw TFLOPS and higher bandwidth but at the cost of significantly higher power consumption (800W+), heat, and physical size. The GX10 offers a "turn-key" 128GB VRAM solution in a 1.5kg box that consumes only 140W, making it much more suitable for office or edge environments.

The ASUS Ascent GX10 - 2TB is the definitive choice for practitioners who need a high-VRAM, CUDA-compatible workstation that prioritizes efficiency and a small footprint over raw, power-hungry desktop performance. Its 2TB PCIe 4.0 storage provides ample space for a library of the latest GGUF or EXL2 model files, making it a versatile hub for local AI development.

Compatible AI Models

Hide F tierOnly popular models

148 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
BAGEL-7B-MoTBytedance	14B(7B active)	AA	45.9 tok/s	4.8 GB
Stable Diffusion 3.5 LargeStability AI	8.1B	AA	40.2 tok/s	5.5 GB
e5-mistral-7b-instructintfloat (Microsoft Research)	7.1B	AA	45.9 tok/s	4.8 GB
SFR-Embedding-MistralSalesforce	7.1B	AA	45.9 tok/s	4.8 GB
Linq-Embed-MistralLinq AI Research	7.1B	AA	45.9 tok/s	4.8 GB
GritLM-7BGritLM (Contextual AI)	7.2B	AA	45.3 tok/s	4.9 GB
llama-embed-nemotron-8bNVIDIA	7.5B	AA	45.9 tok/s	4.8 GB
F2LLM-v2-8BCodeFuse-AI (Ant Group)	7.6B	AA	46.5 tok/s	4.7 GB
Octen-Embedding-8BOcten AI	7.6B	AA	46.5 tok/s	4.7 GB
Qwen3-Embedding-8BQwen/Alibaba	7.6B	AA	46.5 tok/s	4.7 GB
gte-Qwen2-7B-instructAlibaba-NLP (Tongyi Lab)	7.1B	AA	49.0 tok/s	4.5 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Granite Speech 3.3 8BIBM	9B	AA	36.5 tok/s	6.0 GB
FLUX.2 [klein] 9BBlack Forest Labs	9B	AA	36.5 tok/s	6.0 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Phi-4-multimodal-instructMicrosoft	5.6B	AA	55.9 tok/s	3.9 GB
Z-Image-TurboAlibaba	6B	AA	52.6 tok/s	4.2 GB
BOOM_4B_v1ICT-CAS TIME / Querit	4B	AA	81.2 tok/s	2.7 GB
F2LLM-v2-4BCodeFuse-AI (Ant Group)	4B	AA	81.2 tok/s	2.7 GB
Qwen3-Embedding-4BQwen/Alibaba	4B	AA	81.2 tok/s	2.7 GB
FLUX.2 [klein] 4BBlack Forest Labs	4B	AA	74.5 tok/s	3.0 GB
Mochi 1 PreviewGenmo AI	10B	AA	33.2 tok/s	6.6 GB
KaLM-Embedding-Gemma3-12B-2511Tencent	11.8B	AA	30.9 tok/s	7.1 GB

Rows per page

Page 1 of 6