kai-os

Carnice-9b for Hermes agent

A specialized 9B dense model tuned specifically for terminal execution, file editing, and precise tool calling within the Hermes Agent harness.

9B paramsDense

View on Hugging Face

Capabilities

Code Generation

Reasoning

Function Calling

Model Specifications

Parameters9B

Active Params9B

ArchitectureDense

ModalityText Only

Providerkai-os

Download Size17.9 GB

Community

Monthly Downloads6.6K

Likes171

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

54.0CC

Benchmark40%

75.0

Popularity25%

4.4

Efficiency20%

94.4

Versatility15%

26.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	4.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	6.0 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	6.9 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	8.0 GB	Excellent	Near-lossless quality with manageable size
Q8_0	10.2 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	18.8 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


AMD Radeon RX 7700 XTAMD	SS	57.8 tok/s	6.0 GB
Intel Arc B580Intel	SS	61.0 tok/s	6.0 GB
NVIDIA GeForce RTX 4070NVIDIA	SS	67.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4070 SUPERNVIDIA	SS	67.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5060 Ti 8GBNVIDIA	SS	60.0 tok/s	6.0 GB
NVIDIA GeForce RTX 5070NVIDIA	SS	89.9 tok/s	6.0 GB
AMD Radeon RX 7600 8GBAMD	SS	38.5 tok/s	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	83.5 tok/s	6.0 GB
AMD Radeon RX 9070AMD	SS	85.7 tok/s	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	85.7 tok/s	6.0 GB
Google Cloud TPU v5eGoogle	SS	109.6 tok/s	6.0 GB
Intel Arc A770 16GBIntel	SS	74.9 tok/s	6.0 GB
NVIDIA GeForce RTX 4070 Ti SUPERNVIDIA	SS	89.9 tok/s	6.0 GB
NVIDIA GeForce RTX 4080 SUPERNVIDIA	SS	98.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5060 Ti 16GBNVIDIA	SS	60.0 tok/s	6.0 GB
NVIDIA GeForce RTX 5070 TiNVIDIA	SS	119.9 tok/s	6.0 GB
NVIDIA GeForce RTX 5080 Founders EditionNVIDIA	SS	128.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4060NVIDIA	SS	36.4 tok/s	6.0 GB
NVIDIA GeForce RTX 4060 Ti 16GBNVIDIA	SS	38.5 tok/s	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	107.1 tok/s	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	128.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4090 Founders EditionNVIDIA	SS	134.9 tok/s	6.0 GB
NVIDIA L4 Tensor Core GPUNVIDIA	SS	40.2 tok/s	6.0 GB
Google Cloud TPU v6e (Trillium)Google	SS	219.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5090 Founders EditionNVIDIA	SS	239.8 tok/s	6.0 GB

Rows per page

Page 1 of 4

About This Model

Carnice-9b is a specialized 9B dense model developed by kai-os, engineered specifically for autonomous agent workflows within the Hermes Agent harness. Unlike general-purpose models designed for chat or leaderboard optimization, Carnice-9b is a surgical refinement of the Qwen 3.5 9B architecture. It is built to execute terminal commands, manage file systems, and navigate web browsers with high precision.

For developers building local agentic loops, Carnice-9b occupies a unique niche. While most 7B-14B models struggle with the "hallucination of tools" or failing to follow strict formatting in multi-step execution, this model was trained on harness-native traces. This makes it a primary candidate for practitioners who need a lightweight, local alternative to GPT-4o for driving autonomous developer agents.

Architecture & Technical Details

Carnice-9b is a dense transformer model with 9 billion parameters. It is a merged standalone checkpoint, meaning it functions as a complete model without requiring separate PEFT adapters during inference.

The model’s training was divided into two distinct phases to ensure both logic and execution remained sharp:

Stage A (Reasoning Repair): A pass using high-signal reasoning datasets like Bespoke-Stratos-17k and NuminaMath-CoT to ensure the model retained logical consistency after the base merge.
Stage B (Hermes Refresh): A specialized pass using native Hermes Agent traces and OpenThoughts-Agent-v1-SFT. This stage specifically tuned the model to the exact message patterns and tool-calling schemas expected by the Hermes harness.

Because it is based on the Qwen 3.5 architecture, it benefits from high efficiency in tokenization and attention mechanisms, making it highly responsive on consumer-grade hardware. It uses the Apache 2.0 license, allowing for broad local deployment and integration into proprietary internal agent pipelines.

Capabilities & Use Cases

The primary strength of Carnice-9b for Hermes agent is its adherence to structured action outputs. It is not designed to be your next creative writing assistant; it is designed to be a "driver" for a terminal.

Autonomous File Editing and Refactoring

The model excels at multi-turn tool calling where it must read a file, reason about changes, and output precise edits. Because it was trained on harness-native behavior, it is less likely to break the XML or JSON formatting required by agent frameworks like Hermes.

Carnice-9b is tuned for terminal-heavy tasks. This includes navigating directories, executing shell scripts, and interpreting error logs to self-correct. When paired with a browser tool, it can handle web-assisted research tasks, extracting data to inform its next terminal-based action.

Precise Function Calling

Generic models often fail at function calling when the schema becomes complex. Carnice-9b’s training centered on "harness-native action structure," meaning it has a higher success rate in generating valid tool calls that the Hermes runtime can actually parse and execute without human intervention.

Running Carnice-9b for Hermes agent Locally

To run Carnice-9b for Hermes agent locally, you need to consider the VRAM footprint of the 9B parameter count alongside the overhead of the agent harness itself.

Hardware Requirements & VRAM

Minimum (4-bit quantization): 8GB VRAM. An NVIDIA RTX 3060 or 4060 Ti (16GB variant) is an excellent entry point.
Recommended (Q6_K or Q8_0): 12GB to 16GB VRAM. An RTX 4070 Ti Super or RTX 4080 ensures the model and the KV cache stay entirely on the GPU.
Apple Silicon: An M2 or M3 Max with at least 16GB of Unified Memory provides a seamless experience, especially for long-running agent tasks.

Recommended Quantization

For most practitioners, Q4_K_M is the fastest way to run Carnice-9b for Hermes agent locally without a significant drop in reasoning. However, if the agent is performing complex multi-file refactoring, the Q6_K quantization (approx. 6.9 GB) is the "sweet spot" for maintaining tool-calling precision while keeping inference speeds high.

Performance and Deployment

Tokens per second: On an RTX 4090, you can expect speeds exceeding 100 t/s. On mid-range hardware like an RTX 3070, expect 40-60 t/s.
Ollama: The quickest way to deploy is via Ollama, though for the best agent performance, using a backend that supports the specific Hermes system prompt and tool-calling format (like llama.cpp or vLLM) is advised.

How It Compares

When evaluating Carnice-9b for Hermes agent performance, it is best compared against other "agent-first" or small-footprint models.

vs. Llama-3.1-8B-Instruct: Llama-3.1 is a superior generalist and has better broad knowledge. However, Carnice-9b often outperforms it in the specific context of the Hermes harness because it doesn't try to "chat"—it focuses on the terminal/tool loop.
vs. Qwen2.5-7B-Coder: While Qwen2.5-Coder is exceptional at raw code generation, Carnice-9b is better at the process of being an agent (planning, executing, and reacting to tool outputs).
vs. Mistral-7B-v0.3: Carnice-9b offers more modern reasoning capabilities and a more refined tool-calling interface for autonomous workflows compared to the older Mistral architecture.

If your goal is to build a local AI software engineer that can actually interact with your file system, Carnice-9b is currently one of the most optimized 9B models for that specific task.

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

kai-os

Carnice-9b for Hermes agent

A specialized 9B dense model tuned specifically for terminal execution, file editing, and precise tool calling within the Hermes Agent harness.

9B paramsDense

View on Hugging Face

Capabilities

Code Generation

Reasoning

Function Calling

Model Specifications

Parameters9B

Active Params9B

ArchitectureDense

ModalityText Only

Providerkai-os

Download Size17.9 GB

Community

Monthly Downloads6.6K

Likes171

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0

Performance & Scoring

Benchmarks

No benchmark data available for this model yet.

Overall Score

54.0CC

Benchmark40%

75.0

Popularity25%

4.4

Efficiency20%

94.4

Versatility15%

26.5

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	4.1 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	6.0 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	6.9 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	8.0 GB	Excellent	Near-lossless quality with manageable size
Q8_0	10.2 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	18.8 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


AMD Radeon RX 7700 XTAMD	SS	57.8 tok/s	6.0 GB
Intel Arc B580Intel	SS	61.0 tok/s	6.0 GB
NVIDIA GeForce RTX 4070NVIDIA	SS	67.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4070 SUPERNVIDIA	SS	67.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5060 Ti 8GBNVIDIA	SS	60.0 tok/s	6.0 GB
NVIDIA GeForce RTX 5070NVIDIA	SS	89.9 tok/s	6.0 GB
AMD Radeon RX 7600 8GBAMD	SS	38.5 tok/s	6.0 GB
AMD Radeon RX 7800 XTAMD	SS	83.5 tok/s	6.0 GB
AMD Radeon RX 9070AMD	SS	85.7 tok/s	6.0 GB
AMD Radeon RX 9070 XTAMD	SS	85.7 tok/s	6.0 GB
Google Cloud TPU v5eGoogle	SS	109.6 tok/s	6.0 GB
Intel Arc A770 16GBIntel	SS	74.9 tok/s	6.0 GB
NVIDIA GeForce RTX 4070 Ti SUPERNVIDIA	SS	89.9 tok/s	6.0 GB
NVIDIA GeForce RTX 4080 SUPERNVIDIA	SS	98.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5060 Ti 16GBNVIDIA	SS	60.0 tok/s	6.0 GB
NVIDIA GeForce RTX 5070 TiNVIDIA	SS	119.9 tok/s	6.0 GB
NVIDIA GeForce RTX 5080 Founders EditionNVIDIA	SS	128.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4060NVIDIA	SS	36.4 tok/s	6.0 GB
NVIDIA GeForce RTX 4060 Ti 16GBNVIDIA	SS	38.5 tok/s	6.0 GB
AMD Radeon RX 7900 XTAMD	SS	107.1 tok/s	6.0 GB
AMD Radeon RX 7900 XTXAMD	SS	128.5 tok/s	6.0 GB
NVIDIA GeForce RTX 4090 Founders EditionNVIDIA	SS	134.9 tok/s	6.0 GB
NVIDIA L4 Tensor Core GPUNVIDIA	SS	40.2 tok/s	6.0 GB
Google Cloud TPU v6e (Trillium)Google	SS	219.5 tok/s	6.0 GB
NVIDIA GeForce RTX 5090 Founders EditionNVIDIA	SS	239.8 tok/s	6.0 GB

Rows per page

Page 1 of 4

About This Model

Architecture & Technical Details

The model’s training was divided into two distinct phases to ensure both logic and execution remained sharp:

Stage A (Reasoning Repair): A pass using high-signal reasoning datasets like Bespoke-Stratos-17k and NuminaMath-CoT to ensure the model retained logical consistency after the base merge.
Stage B (Hermes Refresh): A specialized pass using native Hermes Agent traces and OpenThoughts-Agent-v1-SFT. This stage specifically tuned the model to the exact message patterns and tool-calling schemas expected by the Hermes harness.

Capabilities & Use Cases

Autonomous File Editing and Refactoring

Precise Function Calling

Running Carnice-9b for Hermes agent Locally

To run Carnice-9b for Hermes agent locally, you need to consider the VRAM footprint of the 9B parameter count alongside the overhead of the agent harness itself.

Hardware Requirements & VRAM

Minimum (4-bit quantization): 8GB VRAM. An NVIDIA RTX 3060 or 4060 Ti (16GB variant) is an excellent entry point.
Recommended (Q6_K or Q8_0): 12GB to 16GB VRAM. An RTX 4070 Ti Super or RTX 4080 ensures the model and the KV cache stay entirely on the GPU.
Apple Silicon: An M2 or M3 Max with at least 16GB of Unified Memory provides a seamless experience, especially for long-running agent tasks.

Recommended Quantization

Performance and Deployment

Tokens per second: On an RTX 4090, you can expect speeds exceeding 100 t/s. On mid-range hardware like an RTX 3070, expect 40-60 t/s.
Ollama: The quickest way to deploy is via Ollama, though for the best agent performance, using a backend that supports the specific Hermes system prompt and tool-calling format (like llama.cpp or vLLM) is advised.

How It Compares

When evaluating Carnice-9b for Hermes agent performance, it is best compared against other "agent-first" or small-footprint models.

vs. Llama-3.1-8B-Instruct: Llama-3.1 is a superior generalist and has better broad knowledge. However, Carnice-9b often outperforms it in the specific context of the Hermes harness because it doesn't try to "chat"—it focuses on the terminal/tool loop.
vs. Qwen2.5-7B-Coder: While Qwen2.5-Coder is exceptional at raw code generation, Carnice-9b is better at the process of being an agent (planning, executing, and reacting to tool outputs).
vs. Mistral-7B-v0.3: Carnice-9b offers more modern reasoning capabilities and a more refined tool-calling interface for autonomous workflows compared to the older Mistral architecture.

If your goal is to build a local AI software engineer that can actually interact with your file system, Carnice-9b is currently one of the most optimized 9B models for that specific task.

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

Carnice-9b for Hermes agent

Capabilities

Model Specifications

Community

Quick Start

Download from Hugging Face

License

Performance & Scoring

Benchmarks

Overall Score

Quantization Options

Hardware Compatibility

About This Model

Architecture & Technical Details

Capabilities & Use Cases

Autonomous File Editing and Refactoring

Terminal Execution and Browser Navigation

Precise Function Calling

Running Carnice-9b for Hermes agent Locally

Hardware Requirements & VRAM

Recommended Quantization

Performance and Deployment

How It Compares

Find the best hardware for this model

Carnice-9b for Hermes agent

Capabilities

Model Specifications

Community

Quick Start

Download from Hugging Face

License

Performance & Scoring

Benchmarks

Overall Score

Quantization Options

Hardware Compatibility

About This Model

Architecture & Technical Details

Capabilities & Use Cases

Autonomous File Editing and Refactoring

Terminal Execution and Browser Navigation

Precise Function Calling

Running Carnice-9b for Hermes agent Locally

Hardware Requirements & VRAM

Recommended Quantization

Performance and Deployment

How It Compares

Find the best hardware for this model