Moonshot AI's coding-focused successor to Kimi K2.6, a 1T-parameter Mixture-of-Experts model that activates 32B parameters per token across 384 experts (8 active plus 1 shared). It is multimodal with text, image, and video input, supports a 256K-token context, and always runs in thinking mode while using roughly 30% fewer reasoning tokens than K2.6. On Moonshot's own benchmarks it scores 62.0 on Kimi Code Bench v2, 53.6 on Program Bench, 35.1 on MLS Bench Lite, and 81.1 on MCP Mark Verified. Weights ship open under a Modified MIT License.
A workable 1000B-parameter MoE language model from Moonshot AI. Pulls ahead on graduate-level reasoning (GPQA) (90/100), so reach for it when that's the dimension that matters. Newly released, so production-readiness is still being shaken out.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 79.4 GB | Low | |
| Q4_K_MRecommended | 86.2 GB | Good | |
| Q5_K_M | 89.4 GB | Very Good | |
| Q6_K | 93.2 GB | Excellent | |
| Q8_0 | 101.2 GB | Near Perfect | |
| FP16 | 131.6 GB | Full |
See which devices can run this model and at what quality level.
NVIDIA H200 SXM 141GBNVIDIA | SS | 44.8 tok/s | 86.2 GB | |
| SS | 49.5 tok/s | 86.2 GB | ||
Google TPU v7 (Ironwood)Google | SS | 68.9 tok/s | 86.2 GB | |
NVIDIA B200 GPUNVIDIA | SS | 74.7 tok/s | 86.2 GB | |
| SS | 56.1 tok/s | 86.2 GB | ||
| SS | 74.7 tok/s | 86.2 GB | ||
| SS | 34.6 tok/s | 86.2 GB | ||
| SS | 66.3 tok/s | 86.2 GB | ||
| SS | 66.3 tok/s | 86.2 GB | ||
| SS | 66.3 tok/s | 86.2 GB | ||
| SS | 66.3 tok/s | 86.2 GB | ||
SuperMicro Super AI StationSuperMicro | SS | 66.3 tok/s | 86.2 GB | |
Gigabyte W775-V10-L01Gigabyte | SS | 66.3 tok/s | 86.2 GB | |
| AA | 22.9 tok/s | 86.2 GB | ||
Google Cloud TPU v5pGoogle | AA | 25.8 tok/s | 86.2 GB | |
| BB | 7.5 tok/s | 86.2 GB | ||
| BB | 7.5 tok/s | 86.2 GB | ||
| BB | 5.7 tok/s | 86.2 GB | ||
| BB | 5.7 tok/s | 86.2 GB | ||
| BB | 5.7 tok/s | 86.2 GB | ||
| BB | 5.1 tok/s | 86.2 GB | ||
| BB | 5.1 tok/s | 86.2 GB | ||
| BB | 5.1 tok/s | 86.2 GB | ||
| BB | 5.1 tok/s | 86.2 GB | ||
| BB | 2.6 tok/s | 86.2 GB |
Energy cost on NVIDIA A100 SXM4 80GB (~19 tok/s, Q4_K_M) vs flagship API pricing.
| Source | Cost per 1M tokens |
|---|---|
Local (energy only)Kimi K2.7 Code on NVIDIA A100 SXM4 80GB · ~19 tok/s · 400W | $0.700 |
GPT-5.5OpenAI · in $5.00 · out $30.00 | $12.50 |
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00 | $11.00 |
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00 | $3.75 |
Grok 4.3xAI · in $1.25 · out $2.50 | $1.63 |
API prices blended at 70% input / 30% output.
Hardware amortisation not included. Run the full ROI calculator for payback math.
Cheapest current cloud rentals with at least 86 GB VRAM, refreshed hourly.
| Option | Cost / GPU-hour |
|---|---|
AMD Instinct MI300XRunPod · Community · 192 GB VRAM | $0.50 |
NVIDIA H200 NVLRunPod · Community · 141 GB VRAM | $0.50 |
NVIDIA H200 NVLVast.ai · Spot · 141 GB VRAM | $1.67 |
NVIDIA H200 SXMVast.ai · Spot · 141 GB VRAM | $1.93 |
NVIDIA H200 NVLVast.ai · On-Demand · 141 GB VRAM | $1.94 |
Per-GPU rate across RunPod and the Vast.ai marketplace.
Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.
Moonshot AI’s Kimi K2.7 Code is a 1-trillion-parameter Mixture-of-Experts model built specifically for agentic coding and software engineering. It activates only 32 billion parameters per token — 32B out of 1000B total — making it a sparse, compute-efficient architecture for long-horizon tasks like end-to-end code generation, refactoring, and autonomous debugging.
As the successor to K2.6, K2.7 Code delivers substantial benchmark gains: +21.8% on Moonshot’s in-house Kimi Code Bench v2 (62.0 vs. 50.9), +11.0% on Program Bench, and +31.5% on MLS Bench Lite. It also reduces reasoning token usage by roughly 30%, getting to answers faster. The model handles text, images, and video input, supports a 262,144-token context, and always operates in thinking mode.
K2.7 Code is released under a Modified MIT License — open weights that you can inspect, fine-tune, and deploy. It’s positioned as a direct competitor to closed coding agents like GPT-5.5 and Claude Opus 4.8, but with the transparency and control that local-first practitioners demand.
K2.7 Code uses a sparse MoE layout with 384 experts, of which 8 are selected per token plus 1 shared expert. The total parameter count is 1T, but only 32B are active at any inference step. This means:
The model includes a 400M-parameter MoonViT vision encoder, enabling multimodal understanding for image and video inputs. The context window is 262,144 tokens — enough for large code repositories, entire documentation sets, or long conversation histories.
| Parameter | Value |
|---|---|
| Total parameters | 1T (1000B) |
| Active parameters per token | 32B |
| Number of experts | 384 (8 active + 1 shared) |
| Context length | 262,144 tokens |
| Attention mechanism | MLA |
| Activation | SwiGLU |
| Vision encoder | MoonViT (400M) |
| Vocabulary | 160K tokens |
| License | Modified MIT |
K2.7 Code is specialized for agentic coding workflows — tasks that require multi-step reasoning, tool use, and interaction with the filesystem and shell. It excels at:
Because it’s open weight, you can fine-tune it on your own codebase or domain-specific rules. The Kimi Code CLI (available via curl install) provides a ready‑to‑use terminal interface that wraps K2.7 Code with file I/O, search, and subagent spawning capabilities.
Running a 1T-parameter model locally demands serious hardware. Here’s what you need to know:
| Configuration | VRAM | Feasibility |
|---|---|---|
| Single RTX 4090 or M4 Max (64–128 GB unified) | Not enough for Q4 or higher | ❌ No |
| 4× RTX 4090 (96 GB total) | Possible with Q2 and CPU offloading | ⚠️ Marginal |
| 1× A100 80 GB (with quantization and offloading) | 80 GB | ❌ No — Q2 still needs > 200 GB |
| 8× A100 80 GB (640 GB total) | Fits Q4_K_M | ✅ Yes |
Unless you have a multi-GPU workstation or cloud GPU cluster, consider:
With a proper multi-GPU setup using 8× H100, you can expect 50–100 tokens per second for text generation, depending on batch size and context length. The 30% reduction in reasoning tokens means fewer output tokens per task compared to K2.6.
K2.7 Code competes directly with other open-weight coding models, but its scale is unique. Here’s how it stacks:
| Model | Total / Active Parameters | Context | Coding Benchmarks (similar conditions) |
|---|---|---|---|
| Kimi K2.7 Code | 1T / 32B | 256K | Kimi Code Bench v2: 62.0 |
| DeepSeek-Coder-V2 | 236B / 21B | 128K | SWE-bench verified: ~20–25% (older) |
| Qwen2.5-Coder-32B | 32B (dense) | 128K | Aider-polyglot: ~35–40% (dense) |
| GPT-5.5 (closed) | – | – | 69.0 on Kimi Code Bench v2 |
| Claude Opus 4.8 (closed) | – | – | 67.4 on Kimi Code Bench v2 |
Choose K2.7 Code if: You need a powerful open coding agent for complex, multi-file, long-context tasks and have access to multi-GPU infrastructure. Its sparse 32B active parameters keep inference fast once the model is loaded.
Choose a smaller model (e.g., Qwen2.5-Coder-32B) if: You’re limited to a single consumer GPU. You’ll lose some agentic capability but gain deployability.
Benchmark caveat: All K2.7 Code scores are from Moonshot’s own benchmarks. Independent evaluations on SWE-bench or Aider are not yet available, so treat the numbers as indicative rather than definitive.

Explore the Provider
Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Moonshot AI model we track.

Explore the Family
The full Kimi family leaderboard with sizes, benchmark scores, and a release timeline.