No image

GEEKOM

GEEKOM IT15 (Ultra 9 285H)

Name: GEEKOM IT15 (Ultra 9 285H)
Brand: GEEKOM
Price: 1399 USD
Availability: InStock

Maxed-out Arrow Lake-H NUC-class mini PC with Intel Core Ultra 9 285H, Arc 140T iGPU, 32GB DDR5, 2TB NVMe. 99 TOPS combined platform AI for 8B-class on-device inference.

AI PCs & LaptopsIn Stock

Edge AIMobile / On-DeviceEnergy Efficient

Buy on Amazon$1,399Calculate ROI

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Our Take

Best for: Entry-level 7B inference and embedding workloads

8 GB will run a 7B Q4 quant and most embedding models, but the KV cache budget is tight. Better as a stepping stone than a long-term home for AI work.

Pair this with

Specifications

Overview

The GEEKOM IT15 (Ultra 9 285H) is a NUC-class mini PC that packs Intel’s Arrow Lake-H flagship mobile silicon into a 4.4” x 4.6” chassis. At $1,399 (MSRP, though street pricing often lands ~$1,200), it targets the prosumer edge of the AI inference market—a bridge between consumer-grade laptops and workstation-class machines. GEEKOM has been a reliable builder of compact systems, and with this model they’ve pushed the combined platform AI performance to 99 TOPS, leveraging the CPU, Intel Arc 140T iGPU, and NPU simultaneously.

For practitioners running 8B-class models locally, this hardware hits a practical sweet spot. It’s not a data-center GPU, but it doesn’t need to be; the shared 32GB DDR5 system memory (90 GB/s bandwidth) acts as unified VRAM for the iGPU, enabling 8B parameter models at Q4–Q5 quantizations with workable token rates. The 65W sustained TDP (peaking around 100W briefly) means it’s viable for always-on inference servers, field deployments, or desktop AI workstations that don’t demand a rack.

What sets the IT15 apart is its efficiency per watt. You can run local LLMs, multimodal models, and agent pipelines without a dedicated GPU, at a power draw that won’t spike your utility bill. It competes directly with high-end mini PCs from Minisforum and ASUS (e.g., NUC 14 Pro+), and with lower-TDP laptops that don’t have the same memory bandwidth.

AI Performance & Specifications

Memory and VRAM

32GB DDR5-5600 dual-channel, shared with the Intel Arc 140T iGPU.
Effective VRAM: ~8 GB available to the GPU after OS and background processes (Windows 11 Pro). This is the hard ceiling for model weight + context + KV cache.
Memory bandwidth: 90 GB/s. For comparison, a desktop RTX 4060 has ~272 GB/s; the IT15’s bandwidth is the primary bottleneck for token generation speed.

Compute

Combined platform AI: 99 TOPS (INT8) across CPU + Arc iGPU + NPU. In practice, most inference runs on the iGPU (~70 TOPS) with CPU fallback for memory-bound operations.
FP16 performance on the iGPU is roughly 2.7 TFLOPS (estimated via INT8/FP16 ratio), adequate for simple vision tasks but not for training.
NPU: Intel AI Boost handles lightweight, always-on audio and vision preprocessing (e.g., wake-word detection, background blur). Not used for LLM inference.

Power and Thermals

Sustained TDP: ~50-55W after boost expires (≈30s burst up to 90-100W). Real-world power draw under LLM inference load is ~45-55W, making it energy-efficient for 24/7 operation.
Thermals: Fan noise is audible under sustained load (reviewers note it as “loud”). The chassis stays cool to the touch, but the system throttles to maintain safe temps. For AI workloads, you’ll want good airflow or a quiet location.

Connectivity

2x USB4 (40 Gbps) can be used for external GPUs (eGPU), though OCuLink is absent—a notable gap for those wanting to add a dedicated card later.
2.5GbE LAN, WiFi 7, BT 5.4, dual HDMI 2.1, SD card slot. Supports 8K quad display output.
Storage: 2TB NVMe PCIe Gen4 (tested sequential writes ~1.2 GB/s when almost full—adequate for model storage and swap).

What Models Can It Run?

The shared memory architecture means model fitting depends on both weight size and context length. Here’s what works:

Fits comfortably (8B at Q4–Q5)

Llama 3.1 8B (Q4_K_M): ~5.5 GB weights + 2 GB buffer for 4K context. Token generation: ~10–15 tok/s on iGPU with llama.cpp (CPU offload layer mixing). Acceptable for chat, summarization, code generation.
Mistral 7B (Q5_K_M): ~4.8 GB, runs at 15–18 tok/s. Excellent for real-time conversation.
Qwen 2.5 7B (Q4): Similar performance. Multilingual tasks are smooth.
DeepSeek-R1 7B (Q4): Runs well; CoT reasoning may stress context window—keep under 8K tokens for best speed.

Manageable with patience (13B at Q3)

Llama 3.1 13B (Q3_K_M): ~7 GB weights. Leaves minimal headroom for context. Expect 6–9 tok/s, with frequent KV cache evictions. Fine for batch inference or async tasks.
Mixtral 8x7B (Q2) : Too large (~10 GB) – will exceed VRAM and swap to system RAM, dropping to <2 tok/s. Not recommended.

Multimodal and vision models

LLaVA 1.5 7B (Q4): Runs at ~10 tok/s for text generation; image encoding uses iGPU compute. Works for simple image captioning.
Qwen-VL 7B: Similar behavior. Processing large images (4K) may cause 1–2s latency per image.

Long-context tasks

With 8B Q4 models, you can push to 8K–16K context before hitting the wall. Beyond that, use sliding window or Flash Attention. The 90 GB/s bandwidth limits speed—expect 40–50% slower than a desktop RTX 4070.

Sweet spot

8B at Q4_K_S with 4K context. This yields ~13–15 tok/s, balancing quality and speed. For coding or structured tasks, Q8_0 on a 3B model (e.g., Phi-3-mini) runs at 25+ tok/s—great for autocomplete and agent scripts.

Use Cases & Target Audience

Hobbyists and local LLM enthusiasts

If you want a dedicated always-on machine for ChatGPT-style queries, local RAG, or proof-of-concept agents, the IT15 eliminates GPU complexity. Plug in, install LM Studio or Ollama, and start inferring. The 2TB SSD holds dozens of quantized models.

Edge AI and field deployments

The 65W sustained power, compact size (less than 2” tall), and passive capability under light load make it viable for remote edge nodes—digital signage with AI, on-device transcription, or low-volume inference at a retail location. Pair with a battery UPS for mobile use.

Developers building AI-powered applications

Test agents, chaining calls to local models, or run a local inference server for a small team (1–3 concurrent users). The USB4 port supports an external GPU if you later need more VRAM. The IT15 is a flexible dev sandbox.

Not for training

The shared memory and iGPU compute (2.7 TFLOPS FP16) are insufficient for any meaningful training or fine-tuning. Stick to inference.

How It Compares

vs. Minisforum MS-01 (i9-13900H + RTX 4060 laptop GPU)

Price: MS-01 with RTX 4060 ~$1,500.
VRAM: 8GB dedicated GDDR6 vs. shared 8GB DDR5. The RTX 4060 gives ~220 GB/s bandwidth, enabling 8B Q4 models at 25–30 tok/s—roughly 2x faster than the IT15.
Form factor: MS-01 is slightly larger but includes OCuLink for future eGPU.
When to pick IT15: Lower power draw (65W vs 120W+), smaller footprint, quieter under light load, and similar VRAM ceiling. Choose the IT15 for energy-sensitive edge or desktop where speed isn’t critical.

vs. ASUS NUC 14 Pro+ (Ultra 9 185H + Arc iGPU)

CPU: The 285H is Arrow Lake (15th Gen), slightly faster single-core and better NPU. The 185H is Meteor Lake (14th Gen)
Memory bandwidth: Both 90 GB/s. The NUC 14 Pro+ uses LPDDR5x, achieving similar numbers.
Price: NUC 14 Pro+ ~$1,100, cheaper but with 1TB SSD and sometimes 16GB RAM.
When to pick IT15: Comes fully loaded (32GB + 2TB) out of box. The 285H’s NPU improvement matters if you use Windows Studio Effects or other ONNX runtime tasks. If budget is tighter, the NUC 14 Pro+ is a solid alternative with comparable inference performance.

Compatible AI Models

Hide F tierOnly popular models

56 models


Qwen3-30B-A3BAlibaba Cloud (Qwen)	30B(3B active)	AA	13.5 tok/s	5.4 GB
Llama 3 8B InstructMeta	8B	BB	12.8 tok/s	5.7 GB
Carnice-9b for Hermes agentkai-os	9B	BB	12.0 tok/s	6.0 GB
Llama 2 7B ChatMeta	7B	BB	15.1 tok/s	4.8 GB
Gemma 4 E2B ITGoogle	2B	BB	19.5 tok/s	3.7 GB
Mistral 7B InstructMistral AI	7B	BB	11.3 tok/s	6.4 GB
Gemma 4 E4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
Gemma 3 4B ITGoogle	4B	CC	10.5 tok/s	6.9 GB
AdPayPerQPay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ
Qwen3.6 35B-A3BAlibaba Cloud	35B(3B active)	DD	8.5 tok/s	8.5 GB
Qwen3.5-35B-A3BAlibaba Cloud (Qwen)	35B(3B active)	DD	8.5 tok/s	8.5 GB
Llama 2 13B ChatMeta	13B	DD	8.6 tok/s	8.5 GB
Llama 3.1 8B InstructMeta	8B	FF	5.4 tok/s	13.3 GB
Qwen3.5-9BAlibaba Cloud (Qwen)	9B	FF	2.9 tok/s	24.6 GB
Mistral Small 3 24BMistral AI	24B	FF	1.9 tok/s	39.0 GB
Gemma 4 26B-A4B ITGoogle	26B(4B active)	FF	6.6 tok/s	11.0 GB
Qwen3.6-27BAlibaba Cloud	27B	FF	1.0 tok/s	72.8 GB
AdVast.aiAffordable on-demand GPU rentals for training and inference. Pick from thousands of hosts.Rent a GPU
Gemma 3 27B ITGoogle	27B	FF	1.7 tok/s	43.8 GB
Qwen3.5-27BAlibaba Cloud (Qwen)	27B	FF	1.0 tok/s	72.8 GB
Gemma 4 31B ITGoogle	31B	FF	0.9 tok/s	82.0 GB
Qwen3-32BAlibaba Cloud (Qwen)	32.8B	FF	1.3 tok/s	53.9 GB
Falcon 40B InstructTechnology Innovation Institute	40B	FF	3.0 tok/s	24.4 GB
Mixtral 8x7B InstructMistral AI	46.7B(12.9B active)	FF	6.4 tok/s	11.4 GB
LLaMA 65BMeta	65B	FF	1.8 tok/s	39.3 GB
Llama 2 70B ChatMeta	70B	FF	1.7 tok/s	43.4 GB
AdRunPodServerless and dedicated GPU cloud built for AI workloads. Spin up instances in seconds.Launch on RunPod
Llama 3 70B InstructMeta	70B	FF	1.6 tok/s	45.7 GB

Rows per page

Page 1 of 3

Quick Specs

VRAM8 GB

INT899 TOPS

TDP65 W

Memory BW90 GB/s

Max Params8B at Q4-Q5; 13B at Q3 with patience

Form FactorNUC-class Mini PC

CPUIntel Core Ultra 9 285H (16C/16T Arrow Lake-H, 6P+8E+2LP-E, up to 5.4 GHz)

GPUIntel Arc 140T iGPU

NPUIntel AI Boost

Combined Platform AI99 TOPS (CPU + iGPU + NPU)

Memory32GB DDR5-5600 (dual-channel, ~90 GB/s, shared with iGPU)

Storage2TB NVMe PCIe Gen4 x4 SSD

Display Output8K Quad Display

ConnectivityUSB4, WiFi 7, BT 5.4, 2.5GbE LAN

OSWindows 11 Pro