
A specialized 9B dense model tuned specifically for terminal execution, file editing, and precise tool calling within the Hermes Agent harness.
Access model weights, configuration files, and documentation.
No benchmark data available for this model yet.
See how different quantization levels affect VRAM requirements and quality for this model.
| Format | VRAM Required | Quality | |
|---|---|---|---|
| Q2_K | 4.1 GB | Low | |
| Q4_K_MRecommended | 6.0 GB | Good | |
| Q5_K_M | 6.9 GB | Very Good | |
| Q6_K | 8.0 GB | Excellent | |
| Q8_0 | 10.2 GB | Near Perfect | |
| FP16 | 18.8 GB | Full |
See which devices can run this model and at what quality level.
| SS | 57.8 tok/s | 6.0 GB | ||
Intel Arc B580Intel | SS | 61.0 tok/s | 6.0 GB | |
NVIDIA GeForce RTX 4070NVIDIA | SS | 67.5 tok/s | 6.0 GB | |
| SS | 67.5 tok/s | 6.0 GB | ||
| SS | 60.0 tok/s | 6.0 GB | ||
NVIDIA GeForce RTX 5070NVIDIA | SS | 89.9 tok/s | 6.0 GB | |
| SS | 38.5 tok/s | 6.0 GB | ||
| SS | 83.5 tok/s | 6.0 GB | ||
| SS | 85.7 tok/s | 6.0 GB | ||
| SS | 85.7 tok/s | 6.0 GB | ||
Google Cloud TPU v5eGoogle | SS | 109.6 tok/s | 6.0 GB | |
Intel Arc A770 16GBIntel | SS | 74.9 tok/s | 6.0 GB | |
| SS | 89.9 tok/s | 6.0 GB | ||
| SS | 98.5 tok/s | 6.0 GB | ||
| SS | 60.0 tok/s | 6.0 GB | ||
| SS | 119.9 tok/s | 6.0 GB | ||
| SS | 128.5 tok/s | 6.0 GB | ||
NVIDIA GeForce RTX 4060NVIDIA | SS | 36.4 tok/s | 6.0 GB | |
| SS | 38.5 tok/s | 6.0 GB | ||
| SS | 107.1 tok/s | 6.0 GB | ||
| SS | 128.5 tok/s | 6.0 GB | ||
| SS | 134.9 tok/s | 6.0 GB | ||
| SS | 40.2 tok/s | 6.0 GB | ||
| SS | 219.5 tok/s | 6.0 GB | ||
| SS | 239.8 tok/s | 6.0 GB |
Carnice-9b is a specialized 9B dense model developed by kai-os, engineered specifically for autonomous agent workflows within the Hermes Agent harness. Unlike general-purpose models designed for chat or leaderboard optimization, Carnice-9b is a surgical refinement of the Qwen 3.5 9B architecture. It is built to execute terminal commands, manage file systems, and navigate web browsers with high precision.
For developers building local agentic loops, Carnice-9b occupies a unique niche. While most 7B-14B models struggle with the "hallucination of tools" or failing to follow strict formatting in multi-step execution, this model was trained on harness-native traces. This makes it a primary candidate for practitioners who need a lightweight, local alternative to GPT-4o for driving autonomous developer agents.
Carnice-9b is a dense transformer model with 9 billion parameters. It is a merged standalone checkpoint, meaning it functions as a complete model without requiring separate PEFT adapters during inference.
The model’s training was divided into two distinct phases to ensure both logic and execution remained sharp:
Bespoke-Stratos-17k and NuminaMath-CoT to ensure the model retained logical consistency after the base merge.OpenThoughts-Agent-v1-SFT. This stage specifically tuned the model to the exact message patterns and tool-calling schemas expected by the Hermes harness.Because it is based on the Qwen 3.5 architecture, it benefits from high efficiency in tokenization and attention mechanisms, making it highly responsive on consumer-grade hardware. It uses the Apache 2.0 license, allowing for broad local deployment and integration into proprietary internal agent pipelines.
The primary strength of Carnice-9b for Hermes agent is its adherence to structured action outputs. It is not designed to be your next creative writing assistant; it is designed to be a "driver" for a terminal.
The model excels at multi-turn tool calling where it must read a file, reason about changes, and output precise edits. Because it was trained on harness-native behavior, it is less likely to break the XML or JSON formatting required by agent frameworks like Hermes.
Carnice-9b is tuned for terminal-heavy tasks. This includes navigating directories, executing shell scripts, and interpreting error logs to self-correct. When paired with a browser tool, it can handle web-assisted research tasks, extracting data to inform its next terminal-based action.
Generic models often fail at function calling when the schema becomes complex. Carnice-9b’s training centered on "harness-native action structure," meaning it has a higher success rate in generating valid tool calls that the Hermes runtime can actually parse and execute without human intervention.
To run Carnice-9b for Hermes agent locally, you need to consider the VRAM footprint of the 9B parameter count alongside the overhead of the agent harness itself.
For most practitioners, Q4_K_M is the fastest way to run Carnice-9b for Hermes agent locally without a significant drop in reasoning. However, if the agent is performing complex multi-file refactoring, the Q6_K quantization (approx. 6.9 GB) is the "sweet spot" for maintaining tool-calling precision while keeping inference speeds high.
llama.cpp or vLLM) is advised.When evaluating Carnice-9b for Hermes agent performance, it is best compared against other "agent-first" or small-footprint models.
If your goal is to build a local AI software engineer that can actually interact with your file system, Carnice-9b is currently one of the most optimized 9B models for that specific task.