NVIDIA

NVIDIA DGX Spark

Name: NVIDIA DGX Spark
Brand: NVIDIA
Price: 4699 USD
Availability: InStock

NVIDIA's reference architecture powered by the GB10 Grace Blackwell Superchip, delivering up to 1 PFLOP of FP4 AI performance in a compact, edge-ready form factor.

AI PCs & LaptopsIn Stock

Edge AIBest for LLMsPremium / High-End

Buy on Manufacturer$4,699Calculate ROI

Quick Specs

VRAM128 GB

FP1629.71 TFLOPS

INT8250 TOPS

TDP140 W

Memory BW273 GB/s

Max Params200B

CPU Architecture20-Core Arm (10 Cortex-X925 + 10 Cortex-A725)

GPU ArchitectureBlackwell (48 SMs)

Network Interface1x 10GbE RJ-45, ConnectX-7 200Gbps, Wi-Fi 7, Bluetooth 5.4

Dimensions150 x 150 x 50.5 mm

Weight1.2 kg

Specifications

The NVIDIA DGX Spark represents a fundamental shift in local AI development, moving the Blackwell architecture out of the data center and onto the engineer's desk. It is a reference architecture designed for practitioners who require massive VRAM and high-throughput inference without the latency or privacy concerns of cloud-based providers.

At an MSRP of $4,699, the DGX Spark is positioned as a premium AI workstation for edge deployments and high-end local development. It bridges the gap between consumer-grade RTX 4090 builds—which are limited by PCIe lanes and VRAM capacity—and full-scale DGX H100 racks. For teams building agentic workflows or deploying local LLMs, the Spark provides the necessary headroom to run frontier-class models in a silent, 140W TDP form factor that measures only 150mm squared.

AI Performance & Specifications

The core of the DGX Spark is the GB10 Grace Blackwell Superchip. While consumer hardware often prioritizes rasterization, the Spark is optimized for tensor operations and high-speed memory access.

VRAM and Memory Architecture: The system features 128 GB of unified memory. Unlike traditional GPU/CPU splits, this coherent memory pool allows the Blackwell GPU to access the full 128 GB, enabling the loading of massive model weights that would typically require multiple A100s or a Mac Studio M2/M3 Ultra.
Memory Bandwidth: At 273 GB/s, the bandwidth is the primary driver for token generation speed. While lower than a H100, it is significantly higher than standard DDR5 systems, ensuring that Large Language Models (LLMs) remain responsive during long-context inference.
Compute Throughput: The system delivers 250 TOPS of INT8 performance and 29.71 TFLOPS of FP16 performance. However, the standout spec is the support for FP4 (4-bit floating point), which pushes the AI compute ceiling toward 1 PFLOP. This allows for aggressive quantization with minimal perplexity loss.
Power Efficiency: Operating at a 140W TDP, the DGX Spark delivers a performance-per-watt ratio that makes it ideal for 24/7 edge inference or office environments where 1000W+ workstations are impractical.

What Models Can It Run?

The 128GB VRAM capacity is the defining feature of the DGX Spark, making it one of the few "single-chip" solutions capable of running 200B+ parameter models locally.

Large Language Models (LLMs)

DeepSeek-V3 / Llama 3.1 405B: While a 405B model won't fit at FP16, the DGX Spark is the "sweet spot" for running these models at IQ2_M or Q3_K_S quantization. It can successfully host a 200B+ parameter model like Qwen-235B using NVFP4 quantization via TensorRT-LLM.
Llama 3.1 70B & 80B Models: These run entirely in-memory at FP8 or Q8_0 quantization with significant room left for KV cache. Expect high-throughput performance (40-60+ tokens/second) suitable for real-time agentic loops.
Mixtral 8x22B: Fits comfortably at high precision, allowing for complex MoE (Mixture of Experts) reasoning without offloading to slower system RAM.

Multimodal and Long-Context Tasks

The 128GB buffer is particularly valuable for long-context window operations. Engineers can run Llama 3.1 with a 128k context window without running out of memory, a task that frequently crashes 24GB or 48GB consumer cards. It also handles multimodal models like Llava-v1.6 and CogVLM with ease, providing enough VRAM to process high-resolution image embeddings alongside large text prompts.

Quantization Tradeoffs

For practitioners, the best quality-to-speed tradeoff on this hardware is NVFP4. Utilizing the Blackwell-native FP4 support allows for 2.5x performance gains over launch-day benchmarks, enabling models that previously required 256GB of VRAM to run within the Spark's 128GB footprint.

Use Cases & Target Audience

The DGX Spark is not a gaming machine; it is a dedicated AI appliance for specific professional workflows.

Local AI Agent Developers: Teams building autonomous agents need low-latency inference to prevent "reasoning lag." The Spark’s ability to host a 70B model with a massive KV cache makes it the premier choice for agentic RAG (Retrieval-Augmented Generation) stacks.
Edge AI Deployment: With its compact 1.2 kg weight and 10GbE / ConnectX-7 200Gbps networking, the Spark is designed for on-site data processing where cloud latency is unacceptable—such as medical imaging, secure financial analysis, or industrial automation.
ML Researchers: The pre-installed NVIDIA AI software stack (including TensorRT-LLM, PyTorch, and vLLM) allows researchers to move from Repo to Inference in minutes. It serves as a perfect "sandbox" for fine-tuning via LoRA or QLoRA before scaling to a H100 cluster.
Privacy-Conscious Enterprises: For organizations handling sensitive IP, the DGX Spark allows for the local execution of frontier-level models (like Llama 3.1 405B at 4-bit) without data ever leaving the internal network.

How It Compares

When evaluating the DGX Spark, practitioners typically look at two alternatives: the Apple Mac Studio (M2/M3 Ultra) and Custom Multi-GPU Workstations.

DGX Spark vs. Apple Mac Studio (192GB RAM)

The Mac Studio offers more total unified memory (up to 192GB), but the DGX Spark wins on software ecosystem and raw AI throughput. The Spark’s Blackwell architecture supports FP4 and FP8 hardware acceleration, which Apple's Silicon currently lacks. Furthermore, the Spark’s inclusion of ConnectX-7 200Gbps networking makes it a "cluster-ready" device, whereas the Mac is a standalone workstation. If you rely on the NVIDIA AI stack (CUDA, TensorRT), the Spark is the clear choice.

DGX Spark vs. Dual RTX 6000 Ada Build

A dual RTX 6000 Ada setup provides 96GB of VRAM and higher raw TFLOPS, but at more than double the price (~$14,000) and significantly higher power draw (600W+). The DGX Spark offers more VRAM (128GB) in a smaller, more efficient package for roughly a third of the cost. For inference-heavy workloads where 128GB is the "magic number" for model weights, the Spark provides better value.

Why Choose DGX Spark?

Choose the DGX Spark if your primary bottleneck is VRAM capacity and physical space. It is currently the most power-efficient way to run 200B parameter models at the edge, offering a "plug-and-play" experience for the modern AI engineer.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	SS	40.8 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	59.3 tok/s	3.7 GB
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	AA	25.8 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	AA	25.8 tok/s	8.5 GB
Mistral 7B InstructMistral AI	7B	AA	34.4 tok/s	6.4 GB
Llama 2 13B ChatMeta	13B	AA	26.0 tok/s	8.5 GB
Gemma 4 E4B ITGoogle	4B	AA	31.8 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	31.8 tok/s	6.9 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	19.3 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	20.0 tok/s	11.0 GB
Mistral Large 3 675BMistral AI	675B(41B active)	BB	3.3 tok/s	66.3 GB
GLM-4.6Z.ai	355B(32B active)	BB	3.1 tok/s	70.3 GB
DeepSeek-V3DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
DeepSeek-R1DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
DeepSeek-V3.2DeepSeek	685B(37B active)	BB	3.7 tok/s	59.8 GB
Kimi K2 Instruct 0905Moonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
Kimi K2 ThinkingMoonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
Kimi K2.5Moonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
GLM-5Z.ai	744B(40B active)	BB	2.5 tok/s	87.7 GB
GLM-5.1Z.ai	744B(40B active)	BB	2.5 tok/s	87.7 GB
Kimi K2.6Moonshot AI	1000B(32B active)	BB	2.6 tok/s	86.2 GB

Rows per page

Page 1 of 3

NVIDIA DGX Spark

NVIDIA's reference architecture powered by the GB10 Grace Blackwell Superchip, delivering up to 1 PFLOP of FP4 AI performance in a compact, edge-ready form factor.

AI PCs & LaptopsIn Stock

Edge AIBest for LLMsPremium / High-End

Buy on Manufacturer$4,699Calculate ROI