Hugging Face
Minimal code-writing agents from Hugging Face — the smallest agent framework that works.
GitHub Stars
27.4K
Contributors
203
npm / Week
—
PyPI / Month
582.8K
Smolagents is a minimal agent runtime from Hugging Face that deliberately does less than its peers. Released in 2024 under the Apache 2.0 license, it’s a Python library built around a core insight: agents that write Python code as their action format are more expressive than agents constrained to JSON tool calls. The framework’s tagline — “the smallest agent framework that works” — isn’t marketing hype. The core agent logic fits in roughly 1,000 lines of documented code, which you can read end to end in an afternoon.
This design philosophy comes directly from the original ReAct paper. Where most frameworks layer abstractions on top of abstractions (chains, graphs, state machines, middleware), Smolagents keeps the abstraction surface flat. You get an Agent class, a Tool class, and a model interface. That’s it. The result is a framework that’s ideal for teams who want to understand exactly what their agent is doing, modify behavior without fighting framework internals, and run agents that can handle arbitrary Python logic.
The numbers back up its traction: 27,349 GitHub stars, 203 contributors, and 582,776 monthly PyPI downloads at the time of writing. That’s a strong signal for a framework that’s still under two years old. It’s maintained by the same Hugging Face team that delivers transformers, datasets, and the Hub — a team that knows how to build and maintain open source at scale.
Smolagents competes most directly with LangChain’s code-execution agents and CrewAI’s agent workflows. But its philosophy is closer to the simplicity of Pydantic AI or the original Autogen concept: a thin layer over the model that gives the model agency to act, rather than a sprawling middleware platform.
Smolagents is imperative and code-first. You don’t define a graph, a chain, or a configuration file. You write Python.
At the core are three abstractions:
transformers models, Hugging Face inference endpoints, OpenAI, Anthropic, and any provider via LiteLLM. The model handles tokenization, generation, and parsing of the agent’s action (code) back into the loop.The control flow is straightforward:
There is no built-in state machine, no branching, no subgraph composition. If you want multi-step planning with conditional logic, you express it in the Python code the agent writes. This gives you maximum flexibility at the cost of some guardrails.
For multi-agent scenarios, you can create multiple CodeAgent instances, each with different tools or models, and coordinate them via message passing in your own orchestration code. Smolagents provides no built-in multi-agent protocol; that’s a deliberate choice to keep the core small.
Code-writing agents. Smolagents’ defining capability is that agents output Python code, not JSON tool calls. This matters because code is far more expressive. An agent can import libraries, loop over collections, conditionalize behavior, chain operations, and handle edge cases inline. For tasks like math reasoning, data analysis, search, or file manipulation, code agents outperform JSON-call agents on many benchmarks. The agent can write from math import sqrt; result = sqrt(42) instead of figuring out how to nest square root inside a JSON tool call structure.
Tiny core. The agent logic lives in agents.py — roughly 1,000 lines with inline comments. This is not just a design talking point; it has real implications. You can debug by reading the source. You can customize behavior by forking the file. You can understand the prompt templates, the action parsing, and the loop invariants in an afternoon. For teams evaluating frameworks, this transparency reduces risk.
Sandboxed execution. Code execution is inherently dangerous when the agent runs arbitrary Python. Smolagents provides multiple sandbox backends: E2B, Docker, Blaxel, Modal, and a Pyodide/Deno WebAssembly sandbox. You can choose the level of isolation that fits your risk profile. For trusted inputs (e.g., internal research assistants), local execution may suffice. For production systems with user-provided prompts, use a sandbox backend like E2B that runs each agent session in a fresh environment.
Hub integration. Tools and agents can be pushed to or pulled from the Hugging Face Hub. This enables sharing of curated tools across teams or the open source community. It also means you can start building with a tool library that already includes search, web scraping, file I/O, and more.
Model-agnostic. Smolagents works with any LLM that can be called via a Python API. It ships with support for transformers (local), Hugging Face inference providers, OpenAI, Anthropic, and LiteLLM. You are not locked into Hugging Face’s model ecosystem, though it integrates naturally if you are already using transformers.
Multi-agent capability. While not a first-class abstraction, you can compose multiple agents by running them in parallel or sequence from your own code. This is enough for most multi-agent patterns (research and summarization, iterative refinement, tool dispatching) without the overhead of a dedicated orchestration layer.
Research agents. This is where Smolagents shines. A researcher wants an agent that can fetch documents from the web, process them with pandas or nltk, compute statistics, and produce a summary. A code agent can import those libraries directly, chain operations, and handle errors gracefully — all in one Python script generated automatically. Frameworks that only support discrete JSON tool calls would require splitting that workflow into many individual tool invocations, which is slower and more brittle.
Teaching and exploration. Smolagents is frequently used in workshops and courses to teach agent fundamentals. Because the core code is small and readable, students can trace how the model’s output becomes executed code. It’s a live demonstration of the ReAct pattern. This use case drives much of the framework’s popularity on GitHub.
Hugging Face stack integration. If you already deploy models via transformers or use the Hub for model hosting, Smolagents slots in with zero additional infrastructure. You can run a CodeAgent with a local Mistral or Llama model, using the same tokenizer and generation configuration you already have.
Internal copilots and coding assistants. Teams building internal tools that need to automate data processing, report generation, or system administration find code agents effective. The agent can call shell commands, manipulate files, query databases, and produce output — all in one run.
Poor fit: Smolagents is not a platform for large-scale customer-facing chatbots that require high availability, observability, and load balancing. The framework intentionally lacks production tooling: no built-in tracing, no queue management, no rate limiting. You bring your own observability. If your primary need is a robust conversational system with fallback flows and latency SLAs, look elsewhere.
Install:
1pip install smolagents
The smallest meaningful example (20 lines or less):
1from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel23agent = CodeAgent(4 tools=[DuckDuckGoSearchTool()],5 model=HfApiModel() # uses your HF token or default inference endpoint6)78agent.run("What is the current population of Tokyo, and how does it compare to 10 years ago?")
This agent will: receive the question, generate a plan, write Python code that calls the search tool, parse the results, compute the answer, and return it. No configuration files, no YAML, no extra setup.
What you need around it:
transformers or use OpenAI/Anthropic via LiteLLM.Docs and community: The official documentation is at [huggingface.co/docs/smolagents](https://huggingface.co/docs/smolagents). The GitHub repository at huggingface/smolagents has extensive examples under examples/ and a dedicated AGENTS.md guide. The Hugging Face Discord and forum are active for troubleshooting.
Smolagents vs. LangChain. LangChain is a battle-tested framework with rich production tooling: model providers, memory, callbacks, tracing with LangSmith, and a large ecosystem of integrations. However, it is also heavy. The abstraction layers (chains, agents, tools, retrievers, callbacks) require significant learning. Smolagents is the antipode: minimal abstraction, maximum readability. Choose Smolagents when you want to understand what your agent is doing and you value simplicity over ecosystem breadth. Choose LangChain when you need out-of-the-box observability, versioned chains, or enterprise integrations.
Smolagents vs. CrewAI. CrewAI focuses on multi-agent orchestration with roles, processes, and task delegation. It is good for scenarios where different agents have distinct personas and you want hierarchical workflows. Smolagents does not provide role abstractions. If you need a “manager agent” that delegates to “worker agents” with explicit task lists, CrewAI is a better choice. If you simply want a single powerful code agent that can call multiple tools and reason step by step, Smolagents is cleaner and faster to set up.
Smolagents vs. Pydantic AI. Pydantic AI shares Smolagents’ philosophy of minimalism and Python-first design. Both produce agents that are easy to reason about and customize. The main difference is execution format: Pydantic AI uses structured tool calls (JSON), while Smolagents uses generated Python. For tasks where arbitrary computation is required, Smolagents’ code agents are more capable. For tasks that must strictly adhere to a predefined API surface, Pydantic AI’s type-safety is an advantage.
The tradeoff is clear: Smolagents trades production polish for transparency and power. It is a framework you build on top of, not inside of. If your team values understanding every line of the runtime and wants an agent that can write arbitrary Python, it is the right choice. If you need a turnkey production platform, you will need to supplement it with your own infrastructure.
What the framework gives you out of the box, in plain language.
Agents produce Python code as their action — more expressive than JSON tool calls.
A few hundred lines of well-commented code you can actually read and modify.
Multiple sandbox backends, including E2B and Docker, for safe code execution.
The jobs this framework is best suited for.
Agents that can chain Python operations naturally — math, search, file manipulation.
A readable codebase that's ideal for learning how agent frameworks work.
Native fit when you're already running open-weight models via transformers.
Side-by-Side
Add a second or third framework and see stars, downloads, and capabilities lined up next to each other.
Close alternatives worth a look before you decide.
Composable building blocks for LLM apps — chains, agents, retrievers, and integrations.
Composable LLM building blocks
Stars
137.0K
npm / wk
2.2M
PyPI / mo
241.8M
Multi-agent crews with role-based prompts and explicit task hand-offs.
Role-based multi-agent crews
Stars
51.6K
npm / wk
—
PyPI / mo
9.6M
Type-safe agents with structured outputs from the Pydantic team.
Type-safe Python agents
Stars
17.1K
npm / wk
—
PyPI / mo
39.1M