Live per-token prices across every hosted AI model on OpenRouter. Compare input, output, and blended cost across OpenAI, Anthropic, Google, Mistral, and 50+ other providers. Refreshed hourly.
Sortable, searchable price table for every hosted model on OpenRouter. Click a column header to sort.
inclusionAI: Ling-2.6-flash inclusionai/ling-2.6-flash | inclusionai | 262K | $0.010 | $0.030 | $0.016 |
Mistral: Mistral Nemo mistralai/mistral-nemo | mistralai | 131K | $0.020 | $0.030 | $0.023 |
Meta: Llama 3.1 8B Instruct meta-llama/llama-3.1-8b-instruct | meta-llama | 131K | $0.020 | $0.050 | $0.029 |
Meta: Llama 3 8B Instruct meta-llama/llama-3-8b-instruct | meta-llama | 8K | $0.040 | $0.040 | $0.040 |
Sao10K: Llama 3 8B Lunaris sao10k/l3-lunaris-8b | sao10k | 8K | $0.040 | $0.050 | $0.043 |
IBM: Granite 4.0 Micro ibm-granite/granite-4.0-h-micro | ibm-granite | 131K | $0.017 | $0.112 | $0.046 |
Google: Gemma 3 4B google/gemma-3-4b-it | 131K | $0.040 | $0.080 | $0.052 | |
LiquidAI: LFM2-24B-A2B liquid/lfm-2-24b-a2b | liquid | 128K | $0.030 | $0.120 | $0.057 |
Qwen: Qwen2.5 7B Instruct qwen/qwen-2.5-7b-instruct | qwen | 131K | $0.040 | $0.100 | $0.058 |
Mistral: Mistral Small 3 mistralai/mistral-small-24b-instruct-2501 | mistralai | 33K | $0.050 | $0.080 | $0.059 |
MythoMax 13B gryphe/mythomax-l2-13b | gryphe | 4K | $0.060 | $0.060 | $0.060 |
OpenAI: gpt-oss-20b openai/gpt-oss-20b | openai | 131K | $0.030 | $0.140 | $0.063 |
IBM: Granite 4.1 8B ibm-granite/granite-4.1-8b | ibm-granite | 131K | $0.050 | $0.100 | $0.065 |
Amazon: Nova Micro 1.0 amazon/nova-micro-v1 | amazon | 128K | $0.035 | $0.140 | $0.067 |
Google: Gemma 3 12B google/gemma-3-12b-it | 131K | $0.040 | $0.130 | $0.067 | |
Cohere: Command R7B (12-2024) cohere/command-r7b-12-2024 | cohere | 128K | $0.037 | $0.150 | $0.071 |
Qwen: Qwen3.5-9B qwen/qwen3.5-9b | qwen | 262K | $0.040 | $0.150 | $0.073 |
NVIDIA: Nemotron Nano 9B V2 nvidia/nemotron-nano-9b-v2 | nvidia | 131K | $0.040 | $0.160 | $0.076 |
Arcee AI: Trinity Mini arcee-ai/trinity-mini | arcee-ai | 131K | $0.045 | $0.150 | $0.077 |
Google: Gemma 3n 4B google/gemma-3n-e4b-it | 33K | $0.060 | $0.120 | $0.078 | |
Meta: Llama 3.2 1B Instruct meta-llama/llama-3.2-1b-instruct | meta-llama | 131K | $0.027 | $0.201 | $0.079 |
Qwen: Qwen3 235B A22B Instruct 2507 qwen/qwen3-235b-a22b-2507 | qwen | 262K | $0.071 | $0.100 | $0.080 |
OpenAI: gpt-oss-120b openai/gpt-oss-120b | openai | 131K | $0.039 | $0.180 | $0.081 |
Microsoft: Phi 4 microsoft/phi-4 | microsoft | 16K | $0.065 | $0.140 | $0.088 |
NVIDIA: Nemotron 3 Nano 30B A3B nvidia/nemotron-3-nano-30b-a3b | nvidia | 262K | $0.050 | $0.200 | $0.095 |
Reka Edge rekaai/reka-edge | rekaai | 16K | $0.100 | $0.100 | $0.100 |
Mistral: Ministral 3 3B 2512 mistralai/ministral-3b-2512 | mistralai | 131K | $0.100 | $0.100 | $0.100 |
Z.ai: GLM 4 32B z-ai/glm-4-32b | z-ai | 128K | $0.100 | $0.100 | $0.100 |
Google: Gemma 3 27B google/gemma-3-27b-it | 131K | $0.080 | $0.160 | $0.104 | |
Mistral: Mistral Small 3.2 24B mistralai/mistral-small-3.2-24b-instruct | mistralai | 128K | $0.075 | $0.200 | $0.113 |
Amazon: Nova Lite 1.0 amazon/nova-lite-v1 | amazon | 300K | $0.060 | $0.240 | $0.114 |
Qwen: Qwen3.5-Flash qwen/qwen3.5-flash-02-23 | qwen | 1.0M | $0.065 | $0.260 | $0.124 |
Tencent: Hy3 preview tencent/hy3-preview | tencent | 262K | $0.066 | $0.260 | $0.124 |
Qwen: Qwen3 Coder 30B A3B Instruct qwen/qwen3-coder-30b-a3b-instruct | qwen | 160K | $0.070 | $0.270 | $0.130 |
ByteDance: UI-TARS 7B bytedance/ui-tars-1.5-7b | bytedance | 128K | $0.100 | $0.200 | $0.130 |
Reka Flash 3 rekaai/reka-flash-3 | rekaai | 66K | $0.100 | $0.200 | $0.130 |
Baidu: ERNIE 4.5 21B A3B Thinking baidu/ernie-4.5-21b-a3b-thinking | baidu | 131K | $0.070 | $0.280 | $0.133 |
Baidu: ERNIE 4.5 21B A3B baidu/ernie-4.5-21b-a3b | baidu | 131K | $0.070 | $0.280 | $0.133 |
Mistral: Mistral 7B Instruct v0.1 mistralai/mistral-7b-instruct-v0.1 | mistralai | 4K | $0.110 | $0.190 | $0.134 |
Meta: Llama 3.2 3B Instruct meta-llama/llama-3.2-3b-instruct | meta-llama | 131K | $0.051 | $0.335 | $0.136 |
Qwen: Qwen3 32B qwen/qwen3-32b | qwen | 131K | $0.080 | $0.280 | $0.140 |
NousResearch: Hermes 2 Pro - Llama-3 8B nousresearch/hermes-2-pro-llama-3-8b | nousresearch | 8K | $0.140 | $0.140 | $0.140 |
Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 262K | $0.060 | $0.330 | $0.141 | |
Qwen: Qwen3 14B qwen/qwen3-14b | qwen | 132K | $0.100 | $0.240 | $0.142 |
ByteDance Seed: Seed 1.6 Flash bytedance-seed/seed-1.6-flash | bytedance-seed | 262K | $0.075 | $0.300 | $0.143 |
OpenAI: gpt-oss-safeguard-20b openai/gpt-oss-safeguard-20b | openai | 131K | $0.075 | $0.300 | $0.143 |
Google: Gemini 2.0 Flash Lite google/gemini-2.0-flash-lite-001 | 1.0M | $0.075 | $0.300 | $0.143 | |
DeepSeek: DeepSeek V4 Flash deepseek/deepseek-v4-flash | deepseek | 1.0M | $0.112 | $0.224 | $0.146 |
Meta: Llama 4 Scout meta-llama/llama-4-scout | meta-llama | 10.0M | $0.080 | $0.300 | $0.146 |
EssentialAI: Rnj 1 Instruct essentialai/rnj-1-instruct | essentialai | 33K | $0.150 | $0.150 | $0.150 |
Weighted at 70% input, 30% output tokens — adjust the mix below.
AI API pricing is the per-token cost that hosted model providers charge for sending text to a model and receiving text back. Input tokens are the prompt; output tokens are the completion. Most labs publish two rates per model, quoted in US dollars per one million tokens, and update them several times a year.
This page reads the public OpenRouter model feed live and shows the current input price, output price, and a blended price for every hosted AI model OpenRouter exposes. The feed covers OpenAI (GPT-5, GPT-5 mini, o1, o3), Anthropic (Claude 4.6 Sonnet, Claude 4.6 Haiku, Claude Opus), Google (Gemini 3.1 Pro, Gemini 3.5 Flash), Meta Llama, Mistral, DeepSeek, Qwen, and roughly 50 other providers.
Prices update hourly. The blended column is a weighted average at a 70 percent input, 30 percent output token mix, which approximates typical chat and coding workloads. Use the blend selector to reweight for input-heavy retrieval pipelines or output-heavy generation tasks.

Run the Numbers
Take any model in the table and compare its API cost against buying a GPU and renting one in the cloud. The decision tool plugs the live API price in for you.

Find the break-even point between local hardware and cloud API spend.

Live cloud GPU rental rates across RunPod and the Vast.ai marketplace.

Every open and closed model we track, ranked by benchmark score and hardware fit.

Spec a full local AI rig matched to your budget, with curated parts and pre-built options.
The live table refreshes hourly. We call the public OpenRouter API directly and cache the result for one hour so a traffic spike never overwhelms the upstream. OpenRouter itself aggregates pricing from each provider, so the rates here track the official OpenAI, Anthropic, Google, Mistral, and DeepSeek prices without scraping.
Every major AI lab charges separately for tokens you send in (the prompt) and tokens the model writes out (the completion). Output tokens are usually two to five times more expensive because generation is the compute-heavy step. The blended column shows a weighted average at the input/output mix you pick.
A single per-million-token rate that combines input and output cost at a chosen ratio. The default is 70 percent input, 30 percent output, which roughly matches typical chat and coding workloads where prompts are longer than answers. Switch the blend selector to 50/50 or 30/70 to reweight for your traffic.
Free-tier and promotional models on OpenRouter return $0 for both input and output. We surface them when the Free filter is on, but the sync job that copies prices into our reference model pages skips them on purpose so a temporary promotion never overwrites a real listed price.
A daily background job pulls this same OpenRouter feed and writes the input and output prices onto every reference model in our directory that we have linked to an OpenRouter ID. That means model detail pages and the ROI and decision calculators read the same live numbers without manual edits.
For internal modelling, yes — the figures are the same per-million-token rates the providers publish. For anything you bill a client on, treat this tracker as a "current as of the last refresh" view and confirm against the provider invoice or pricing page. Prices change without notice on the provider side.