NVIDIA

NVIDIA Jetson AGX Orin 64GB Developer Kit

Name: NVIDIA Jetson AGX Orin 64GB Developer Kit
Brand: NVIDIA
Price: 1999 USD
Availability: InStock

NVIDIA's most powerful embedded AI platform with 275 TOPS, 64GB LPDDR5, and Ampere GPU. The gold standard for edge AI, robotics prototyping, and autonomous machine development.

Edge DevicesIn Stock

Edge AIBest for Computer VisionProduction ReadyEnergy Efficient

Buy on Amazon$1,999Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Quick Specs

VRAM64 GB

INT8275 TOPS

TDP60 W

Memory BW204.8 GB/s

Max Params13B at Q4

GPU2,048 CUDA cores (Ampere)

Tensor Cores64 (3rd gen)

CPU12-core Arm Cortex-A78AE

DLA2x NVDLA v2.0

Memory TypeLPDDR5

Memory Bus256-bit

Storage64GB eMMC + NVMe M.2

Connectivity10GbE, USB 3.2, PCIe Gen4

Power Range15W–60W configurable

OSUbuntu Linux (JetPack SDK)

Our Take

Best for: Workstation-class serving of 70B at Q5/Q6 with long context

The first tier where 70B-class models stop feeling cramped. Headroom for KV cache means 32K+ context on Q4 quants without falling off the GPU.

Pair this withKimi K2 Instruct (1000B)Largest popular open model that fits at Q4 — needs roughly 51.8 GB on this 64 GB card.

Generated from this product’s spec sheet. Editor reviews refine it over time.

Specifications

The NVIDIA Jetson AGX Orin 64GB Developer Kit is the flagship of NVIDIA’s edge computing lineup, designed specifically for practitioners who need data-center-class performance in a compact, low-power form factor. While consumer GPUs like the RTX 4090 dominate desktop workloads, the AGX Orin is built for autonomous machines, robotics, and distributed AI agents. It provides a unified memory architecture that allows for massive model loading that would typically require a multi-GPU setup in a traditional PC environment.

At a $1,999 MSRP, this kit serves as the primary development platform for engineers moving from cloud-based prototypes to local AI deployment. It competes directly with high-end industrial PCs and the Apple Mac Studio (M2/M3 Max) for local inference tasks. However, its specialized hardware—including deep learning accelerators (DLA) and a massive 64GB LPDDR5 memory pool—makes it the gold standard for NVIDIA edge devices for AI development.

AI Performance & Specifications

The defining feature of the NVIDIA Jetson AGX Orin 64GB Developer Kit for AI is its 275 INT8 TOPS of compute. This performance is delivered through 2,048 CUDA cores based on the Ampere architecture and 64 third-generation Tensor Cores. Unlike desktop cards restricted by PCIe bus speeds, the Jetson uses a unified memory architecture with 204.8 GB/s of bandwidth, which is critical for the memory-bound nature of Large Language Model (LLM) inference.

VRAM and Memory Bandwidth

The 64GB of LPDDR5 VRAM is the primary reason this device is favored for local AI agents in 2025. Because the CPU and GPU share this memory, you can allocate the vast majority of it to the model weights. This puts the AGX Orin in a unique position: it offers more VRAM than an RTX 4090 (24GB) for less than half the price of an H100, making it one of the most cost-effective ways to get a 64GB GPU for AI workloads.

Power Efficiency and TDP

Operating within a configurable 15W to 60W TDP, the AGX Orin is significantly more efficient than a workstation. For teams building autonomous workflows or edge-based inference servers, this means high-density deployments without the thermal and power overhead of x86 systems. It is effectively a "production-ready" dev kit that mimics the performance of the production modules used in high-end robotics.

What Models Can It Run?

The NVIDIA Jetson AGX Orin 64GB Developer Kit AI inference performance is best evaluated by its ability to run large-scale models locally that would fail on standard consumer hardware.

Large Language Models (LLMs)

The 64GB capacity is the sweet spot for running 13B to 34B parameter models with high precision, or 70B models with heavy quantization.

13B Parameter Models (e.g., Llama 3, Mistral): Can be run at Q4 or Q8 quantization with high throughput. This is the hardware for running 13B at Q4 parameter models while still leaving 50GB+ of headroom for context windows or RAG (Retrieval-Augmented Generation) databases.
30B - 34B Parameter Models (e.g., Yi-34B, Codestral): These fit comfortably in 4-bit or 5-bit quantization, providing a balance of reasoning capability and speed.
70B Parameter Models (e.g., Llama 3.1 70B): Using 4-bit quantization (GGUF or EXL2), a 70B model requires roughly 35-40GB of VRAM. The AGX Orin 64GB can run these models locally, though token generation speed will be slower (roughly 2–5 tokens per second) compared to smaller models.

Computer Vision and Multimodal

Thanks to the 2x NVDLA v2.0 (Deep Learning Accelerators), the Orin can offload standard vision tasks (YOLOv8, Segment Anything Model) from the GPU, allowing the Ampere cores to focus entirely on LLM or agentic logic. This makes it the best edge device for autonomous workflows where simultaneous vision processing and natural language reasoning are required.

Use Cases & Target Audience

Local AI Agents and Autonomous Workflows

For developers building agentic workflows, the AGX Orin 64GB Developer Kit VRAM for large language models allows for long-context retention. You can maintain multiple agents in memory or a single agent with a massive 32k+ token context window, which is essential for complex multi-step reasoning tasks in 2025.

Robotics and Edge Deployment

This is the "best AI chip for local deployment" in environments where a cloud connection is latent or insecure. It is widely used in:

Autonomous Mobile Robots (AMR): For real-time SLAM and path planning.
Industrial Inspection: High-speed defect detection using custom vision transformers.
Smart Cities: Multi-stream video analytics (handling up to 30+ 1080p streams simultaneously).

ML Researchers and Hobbyists

While hobbyists might gravitate toward the RTX 4090 for raw speed, the AGX Orin is preferred by those who need to simulate edge constraints or who require more than 24GB of VRAM for large-scale model experimentation without the $5,000+ price tag of an RTX 6000 Ada.

How It Compares

NVIDIA Jetson AGX Orin 64GB vs. Apple Mac Studio (M3 Max, 64GB)

The Mac Studio is the closest competitor in terms of unified memory. While the Mac may offer higher memory bandwidth, the Jetson wins on ecosystem compatibility. The Jetson runs native Ubuntu Linux and the NVIDIA JetPack SDK, providing direct access to CUDA, TensorRT, and Triton Inference Server—the industry standards for AI deployment. The Mac is a superior workstation for development; the Jetson is a superior platform for deployment and integration into non-desktop hardware.

NVIDIA Jetson AGX Orin 64GB vs. NVIDIA RTX 4090

An RTX 4090 will outperform the AGX Orin in raw tokens per second for models that fit within its 24GB VRAM. However, once you exceed 24GB—such as running a Llama 3.1 70B or a Mixtral 8x7B—the 4090 will require model sharding across multiple GPUs or offloading to system RAM, which craters performance. The AGX Orin 64GB is the better choice for practitioners prioritizing model size and power efficiency over raw peak throughput for small models.

NVIDIA Jetson AGX Orin 64GB vs. Jetson Orin Nano

The Orin Nano is an entry-level alternative (up to 40 TOPS). While suitable for simple sensor fusion, it lacks the VRAM and compute to run modern LLMs effectively. For any task involving generative AI or complex agentic behavior, the AGX Orin 64GB is the necessary minimum.

Compatible AI Models

Hide F tierOnly popular models

61 models


Qwen3-30B-A3BAlibaba	30B(3B active)	AA	30.6 tok/s	5.4 GB
Gemma 4 E2B ITGoogle	2B	AA	44.5 tok/s	3.7 GB
PersonaPlex 7BNVIDIA	7B	AA	34.4 tok/s	4.8 GB
Llama 2 7B ChatMeta	7B	AA	34.4 tok/s	4.8 GB
Llama 3 8B InstructMeta	8B	AA	29.1 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	AA	27.4 tok/s	6.0 GB
Nemotron 3 Nano OmniNVIDIA	30B(3B active)	BB	19.3 tok/s	8.5 GB
Qwen3.6 35B-A3BAlibaba	35B(3B active)	BB	19.3 tok/s	8.5 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Qwen3.5-35B-A3BAlibaba	35B(3B active)	BB	19.3 tok/s	8.5 GB
Llama 2 13B ChatMeta	13B	BB	19.5 tok/s	8.5 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	BB	14.5 tok/s	11.4 GB
Mistral 7B InstructMistral AI	7B	BB	25.8 tok/s	6.4 GB
Qwen3-235B-A22BAlibaba	235B(22B active)	BB	4.5 tok/s	36.3 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	BB	15.0 tok/s	11.0 GB
Qwen3.5-122B-A10BAlibaba	122B(10B active)	BB	6.0 tok/s	27.3 GB
Llama 2 70B ChatMeta	70B	BB	3.8 tok/s	43.4 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Mixtral 8x22B InstructMistral AI	141B(39B active)	BB	3.8 tok/s	43.6 GB
Llama 3 70B InstructMeta	70B	BB	3.6 tok/s	45.7 GB
Qwen 3.5 OmniAlibaba	397B(17B active)	BB	3.6 tok/s	45.2 GB
Qwen3.5-397B-A17BAlibaba	397B(17B active)	BB	3.6 tok/s	46.0 GB
minimax-m2.5MiniMax	230B(10B active)	BB	7.3 tok/s	22.7 GB
Gemma 4 E4B ITGoogle	4B	BB	23.8 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	BB	23.8 tok/s	6.9 GB
Mistral Small 3 24BMistral AI	24B	BB	4.2 tok/s	39.0 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Gemma 3 27B ITGoogle	27B	BB	3.8 tok/s	43.8 GB

Rows per page

Page 1 of 3