NVIDIA

NVIDIA Jetson AGX Thor Developer Kit

Name: NVIDIA Jetson AGX Thor Developer Kit
Brand: NVIDIA
Price: 3499 USD
Availability: InStock

NVIDIA's most powerful embedded AI platform with 800 TOPS, 128GB LPDDR5X, and Blackwell GPU. Designed for humanoid robotics, autonomous vehicles, and safety-critical AI systems.

Edge DevicesIn Stock

Edge AIPremium / High-EndBest for Computer VisionProduction Ready

Buy on Amazon$3,499Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM128 GB

INT8800 TOPS

Memory BW273 GB/s

Max Params70B+ at Q4

GPUBlackwell architecture

FP4 TFLOPS2,070

CPUNext-gen Arm cores

Memory128GB LPDDR5X

AI Performance800 TOPS (7.5x AGX Orin)

Efficiency3.5x better than AGX Orin

TargetHumanoid robots, autonomous vehicles

JetPackSupported

Our Take

Best for: Datacenter inference for flagship dense models

Sized for production serving of 70B–200B class models at full or lightly-quantized precision. Overkill for a homelab; right call when the workload pays for itself in token volume.

Pair this withKimi K2.6 (1000B)Largest popular open model that fits at Q4 — needs roughly 86.2 GB on this 128 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

The NVIDIA Jetson AGX Thor Developer Kit represents the pinnacle of edge computing, specifically engineered to bridge the gap between data-center-class performance and embedded power constraints. As the successor to the AGX Orin, Thor utilizes the Blackwell GPU architecture to deliver a massive leap in compute density. For engineers building autonomous systems, humanoid robotics, or high-throughput agentic workflows at the edge, this is currently the highest-performing silicon available in a compact form factor.

Positioned as a premium, production-ready development platform, the AGX Thor is not a consumer-grade toy. It is a specialized tool for ML researchers and robotics engineers who require massive INT8 throughput and significant VRAM overhead for multi-modal sensor fusion and local LLM reasoning. While it competes loosely with high-end desktop GPUs like the RTX 4090 or specialized Mac Studio configurations, its true competition lies in the industrial sector—outperforming the previous Jetson AGX Orin by 7.5x in AI performance and offering a 3.5x improvement in efficiency.

AI Performance & Specifications

The defining characteristic of the NVIDIA Jetson AGX Thor Developer Kit for AI is its Blackwell-based architecture. This isn't just a marginal upgrade; it introduces FP4 precision support, which is critical for the next generation of quantized model deployment.

Compute and Throughput

With 800 TOPS of INT8 performance and 2,070 FP4 TFLOPS, Thor provides the raw compute necessary for real-time vision transformers and high-speed LLM inference. For practitioners, this means the ability to run complex perception pipelines alongside large language models without hitting the compute ceiling that plagues smaller edge devices.

VRAM and Memory Architecture

The 128GB LPDDR5X memory is a game-changer for local AI agents. In the edge AI space, VRAM is the primary bottleneck for model size. With 128GB of unified memory, Thor effectively functions as a high-capacity GPU for AI, allowing developers to load massive weights that would typically require a multi-GPU server rack. The 273 GB/s memory bandwidth ensures that while it may not match the 1TB/s+ speeds of an H100, it provides sufficient data movement to keep token generation fluid for most real-time applications.

Efficiency and Thermal Design

NVIDIA has optimized Thor for power-constrained environments where traditional 450W desktop cards are non-viable. The 3.5x efficiency gain over Orin allows for higher sustained clock speeds during long-running inference tasks, making it the best AI chip for local deployment in rugged or mobile environments like autonomous vehicles.

What Models Can It Run?

The NVIDIA Jetson AGX Thor Developer Kit VRAM for large language models opens doors that were previously closed to edge devices. This hardware is specifically designed for running 70B+ parameter models at Q4 quantization and beyond.

LLM Compatibility and Performance

Llama 3.1 70B & 405B: A 70B model at 4-bit quantization (Q4_K_M) requires roughly 40GB of VRAM. Thor handles this with ease, leaving over 80GB of headroom for KV cache, long context windows (128k+), and concurrent vision models. Even the 405B model can be fitted using extreme quantization (IQ2/IQ3) or split-loading, though 70B is the "sweet spot" for performance.
DeepSeek-R1 / Qwen 2.5: For developers building reasoning agents, Thor can run DeepSeek-R1 (Distill) or Qwen 2.5 72B at high bitrates (Q6 or Q8) for maximum precision, which is often necessary for coding and logical tasks.
Mistral & Mixtral: Mixture-of-Experts (MoE) models like Mixtral 8x22B fit comfortably, benefiting from the 128GB capacity to maintain high tokens per second even during complex multi-turn dialogues.

Thor's primary design intent is for humanoid robotics, meaning it excels at running segmentation (SAM), object detection (YOLOv11), and vision-language models (Vila, LLaVA) simultaneously. In an autonomous workflow, Thor can process multiple 4K camera streams while running a local LLM to make navigational decisions based on visual input.

Expected Tokens Per Second

While exact benchmarks vary by optimization (TensorRT-LLM vs. llama.cpp), users can expect the following NVIDIA Jetson AGX Thor Developer Kit AI inference performance:

Llama 3 8B (FP16): 100+ t/s (near-instantaneous)
Llama 3 70B (Q4_0): 15–25 t/s (faster than human reading speed)
DeepSeek-V3 (Quantized): Experimental, but feasible for low-latency agentic responses.

Use Cases & Target Audience

The AGX Thor is the best edge device for autonomous workflows where cloud latency is unacceptable and data privacy is paramount.

Humanoid Robotics & Embodied AI: With dedicated support for the NVIDIA Isaac platform, Thor is the reference hardware for developers building robots that need to perceive, reason, and act in real-time.
Edge Inference Servers: For teams needing to deploy a local LLM in a field office or factory floor, Thor acts as a low-power, high-density inference node.
Advanced Autonomous Vehicles: The safety-critical features and high TOPS make it suitable for Level 4/5 autonomy development.
Local AI Agent Development: Developers building "AI in a box" solutions—where a private agent manages sensitive corporate data—will find the 128GB VRAM indispensable for RAG (Retrieval-Augmented Generation) with large vector databases stored in memory.

How It Compares

When evaluating the NVIDIA Jetson AGX Thor Developer Kit vs. alternatives, it is important to distinguish between "raw desktop power" and "edge-integrated power."

VS. NVIDIA Jetson AGX Orin

The Orin was the previous gold standard with 275 TOPS. Thor provides a 7.5x increase in AI compute. If your workload involves LLMs larger than 13B or requires real-time 3D world-model generation, the upgrade to Thor is mandatory. Orin remains a viable mid-tier option for simpler CV tasks, but Thor is the clear choice for "Agentic" edge AI.

VS. Apple Mac Studio (M2/M3 Ultra)

The Mac Studio is often cited as the best hardware for local AI agents 2025 due to its unified memory (up to 192GB). However, the Mac Studio lacks the industrial I/O, GMSL2 camera inputs, and ruggedized power delivery required for edge deployment. While the Mac is a superior desktop development environment, the AGX Thor is the superior deployment platform for robotics and field-based AI.

VS. Discrete Desktop GPUs (RTX 6000 Ada)

An RTX 6000 Ada offers 48GB of VRAM and higher raw TFLOPS, but at a significantly higher power draw and without the ARM-based embedded ecosystem. For practitioners who need more than 48GB of VRAM in a single, efficient package without building a multi-GPU tower, the Thor’s 128GB unified memory pool is a more elegant and power-efficient solution for large-scale model inference.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba	30B(3B active)	SS	40.8 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	AA	38.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	AA	36.5 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	AA	45.9 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	AA	59.3 tok/s	3.7 GB
Qwen3.6 35B-A3BAlibaba	35B(3B active)	AA	25.8 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba	35B(3B active)	AA	25.8 tok/s	8.5 GB
Mistral 7B InstructMistral AI	7B	AA	34.4 tok/s	6.4 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Llama 2 13B ChatMeta	13B	AA	26.0 tok/s	8.5 GB
Gemma 4 E4B ITGoogle	4B	AA	31.8 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	AA	31.8 tok/s	6.9 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	19.3 tok/s	11.4 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	20.0 tok/s	11.0 GB
Mistral Large 3 675BMistral AI	675B(41B active)	BB	3.3 tok/s	66.3 GB
GLM-4.6Z.ai	355B(32B active)	BB	3.1 tok/s	70.3 GB
DeepSeek-V3DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
DeepSeek-R1DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
DeepSeek-V3.1DeepSeek	671B(37B active)	BB	3.7 tok/s	59.8 GB
DeepSeek-V3.2DeepSeek	685B(37B active)	BB	3.7 tok/s	59.8 GB
Kimi K2 Instruct 0905Moonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
Kimi K2 ThinkingMoonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
Kimi K2.5Moonshot AI	1000B(32B active)	BB	2.6 tok/s	84.6 GB
GLM-5Z.ai	744B(40B active)	BB	2.5 tok/s	87.7 GB
GLM-5.1Z.ai	744B(40B active)	BB	2.5 tok/s	87.7 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Kimi K2.6Moonshot AI	1000B(32B active)	BB	2.6 tok/s	86.2 GB

Rows per page

Page 1 of 3