AI Hardware Calculator

Q: Is Q4 Quantization Always the Right Trade-off?

For most users running consumer hardware, yes. Q4_K_M is the sweet spot for 7B–70B models: about 4× smaller than FP16 with minimal quality loss on most benchmarks. The calculator auto-recommends the highest-quality format that fits, so if you have headroom you will see Q5, Q6, or Q8 picked instead. Drop to Q2 or Q3 only when you have to squeeze a much bigger model into limited VRAM.

Q: My Hardware Was Detected as a Different Class: Why?

Browser GPU detection is best-effort. Some integrated GPUs and laptop discrete GPUs share renderer strings that the WebGPU adapter cannot disambiguate. If the detected match is wrong, switch from "Auto-detect" to manual selection and pick the exact card from the directory.

Q: Does the Calculator Cover Apple Silicon Correctly?

Yes. Apple M1, M2, M3, and M4 chips use a single unified memory pool that the OS dynamically partitions between system and GPU. The calculator models the effective GPU memory pool (typically 70–75% of system memory for inference) and uses a lower efficiency constant (0.65) to reflect unified-memory overhead compared to discrete VRAM.

Estimate which AI models your hardware can run, how fast, and at what quality. Powered entirely in your browser.

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Detection happens in your browser. No data sent to any server.

Model Selection

Which kind of model do you need?

Model Sizes

Quantization

Architecture

At least VRAM is required to calculate.

Configure your hardware and hit Calculate to see which AI models you can run.

Back to Hardware Directory

What the AI Hardware Compatibility Calculator Does

The AI hardware compatibility calculator is a free in-browser tool that estimates which AI models you can run locally on a given device, how fast they will generate tokens, and at what quality grade. It computes VRAM requirements from model weights, KV cache, and runtime overhead, then projects tokens-per-second from your GPU memory bandwidth, all without sending any data to a server.

Pick a hardware product from the directory or detect yours automatically in the browser. The calculator then iterates every tracked open-source model, picks the highest-quality quantization that fits your VRAM (FP16 down to Q2_K), and scores each combination from S to F across quality, speed, fit, and context length, weighted by your chosen use case.

Results are deterministic and private. Hardware detection uses the WebGPU adapter and GPU class heuristics. All scoring math runs in a single client-side bundle. No telemetry, no API calls, no data leaves your browser, which means you can run it offline once the page is cached.

Models tracked: 150+
Quantization formats: FP16 → Q2_K
Grading scale: S · A · B · C · D · F
Data privacy: Fully in-browser

Once You Know What Fits

Should You Self-Host or Just Pay per Token?

Knowing a model fits is step one. The decision tool compares the total cost of buying that GPU, renting it in the cloud, and paying per token to a frontier API at your usage.

Open the Decision Tool

Related AI Hardware Tools

Frequently Asked Questions

How Does the Calculator Estimate VRAM for an AI Model?

It sums three components: model weights, KV cache, and a flat 0.5 GB runtime overhead. Weights are parameter count multiplied by bytes-per-parameter (which varies by quantization: Q4_K_M uses 0.58, FP16 uses 2.0). KV cache scales with parameters and context length using the formula 0.000008 × params (B) × context length. The math is detailed in the "How we calculate" panel.

How Are Tokens per Second Estimated Without Actually Running the Model?

Tokens per second is approximated from GPU memory bandwidth divided by model size, then multiplied by an efficiency factor (0.65 for Apple Silicon unified memory, 0.70 for discrete NVIDIA GPUs) and a quantization speed multiplier. When bandwidth is unknown for a card, we fall back to hardware-class constants (220 for NVIDIA, 160 for Apple Silicon). The estimate is conservative: real-world speeds with optimized backends (vLLM, llama.cpp) are usually higher.

What Do the S, A, B, C, D, F Grades Mean?

Each model gets a composite score from 0 to 100 based on four dimensions weighted by your use case: quality (parameter tier minus quantization penalty), speed (tokens/sec relative to a use-case target), fit (sweet spot is 50–80% VRAM utilization), and context (does the context window meet the use-case target). Score thresholds: S ≥ 85, A ≥ 70, B ≥ 55, C ≥ 40, D ≥ 20, F < 20. Models that exceed your VRAM always score F regardless of other factors.

Is Q4 Quantization Always the Right Trade-off?

For most users running consumer hardware, yes. Q4_K_M is the sweet spot for 7B–70B models: about 4× smaller than FP16 with minimal quality loss on most benchmarks. The calculator auto-recommends the highest-quality format that fits, so if you have headroom you will see Q5, Q6, or Q8 picked instead. Drop to Q2 or Q3 only when you have to squeeze a much bigger model into limited VRAM.

My Hardware Was Detected as a Different Class: Why?

Browser GPU detection is best-effort. Some integrated GPUs and laptop discrete GPUs share renderer strings that the WebGPU adapter cannot disambiguate. If the detected match is wrong, switch from "Auto-detect" to manual selection and pick the exact card from the directory.

Does the Calculator Cover Apple Silicon Correctly?

Yes. Apple M1, M2, M3, and M4 chips use a single unified memory pool that the OS dynamically partitions between system and GPU. The calculator models the effective GPU memory pool (typically 70–75% of system memory for inference) and uses a lower efficiency constant (0.65) to reflect unified-memory overhead compared to discrete VRAM.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

AI Hardware Calculator

Estimate which AI models your hardware can run, how fast, and at what quality. Powered entirely in your browser.

PayPerQ—Pay-per-query access to top LLMs without a subscription. Use any model on demand.Try PayPerQ

Detection happens in your browser. No data sent to any server.

Model Selection

Which kind of model do you need?

Model Sizes

Quantization

Architecture

At least VRAM is required to calculate.

Configure your hardware and hit Calculate to see which AI models you can run.

Back to Hardware Directory

What the AI Hardware Compatibility Calculator Does

Models tracked: 150+
Quantization formats: FP16 → Q2_K
Grading scale: S · A · B · C · D · F
Data privacy: Fully in-browser

Once You Know What Fits

Should You Self-Host or Just Pay per Token?

Knowing a model fits is step one. The decision tool compares the total cost of buying that GPU, renting it in the cloud, and paying per token to a frontier API at your usage.

Open the Decision Tool

Related AI Hardware Tools

Frequently Asked Questions

How Does the Calculator Estimate VRAM for an AI Model?

How Are Tokens per Second Estimated Without Actually Running the Model?

What Do the S, A, B, C, D, F Grades Mean?

Is Q4 Quantization Always the Right Trade-off?

My Hardware Was Detected as a Different Class: Why?

Does the Calculator Cover Apple Silicon Correctly?

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

AI Hardware Calculator

Model Selection

What the AI Hardware Compatibility Calculator Does

Should You Self-Host or Just Pay per Token?

Related AI Hardware Tools

AI hardware ROI calculator

AI workstation builder

AI model rankings

GPU rental price index

Frequently Asked Questions

The AI Build Report

AI Hardware Calculator

Model Selection

What the AI Hardware Compatibility Calculator Does

Should You Self-Host or Just Pay per Token?

Related AI Hardware Tools

AI hardware ROI calculator

AI workstation builder

AI model rankings

GPU rental price index

Frequently Asked Questions

The AI Build Report