OpenAI
OpenAI's production-ready agent SDK with tracing, handoffs, and structured outputs.
GitHub Stars
26.4K
Contributors
272
npm / Week
—
PyPI / Month
29.8M
OpenAI Agents SDK is a lightweight, open-source Python and TypeScript framework for building production-grade multi-agent applications. Released in March 2025 under the MIT license, it is maintained by OpenAI and serves as the official successor to the experimental Swarm project. With over 26,000 GitHub stars, 272 contributors, and nearly 30 million monthly PyPI downloads, it has quickly become one of the most adopted agent frameworks in the ecosystem.
The SDK targets a specific gap: teams that want to move beyond single-turn LLM calls into multi-step, tool-using agent workflows without drowning in abstraction layers. Its design philosophy is minimalism. It provides exactly four core primitives — Agent, Runner, Handoff, and Guardrail — and leaves everything else to standard Python control flow. This makes it a natural fit for engineers who prefer explicit composition over declarative DSLs or graph-based orchestration.
In the agent framework landscape, OpenAI Agents SDK competes directly with LangGraph, CrewAI, and AutoGen. Where LangGraph emphasizes graph-based state machines and CrewAI focuses on role-based teams, OpenAI Agents SDK leans into lightweight, code-first orchestration with deep integration into OpenAI’s model ecosystem and observability stack. It is built by the same team that delivers GPT-4o and the Responses API, which means first-class access to OpenAI-specific features like structured outputs, streaming, and built-in tracing in the OpenAI dashboard.
The programming model is imperative and Python-native. You define agents as objects, decorate functions as tools, and orchestrate execution using ordinary if/else blocks, loops, and function calls. There is no graph definition, no YAML config files, and no abstract workflow engine between you and the LLM.
Core abstractions:
Runner.@function_tool becomes a tool with automatic Pydantic-based input validation and JSON schema generation. Tools can also be MCP servers or OpenAI-hosted tools.Control flow is straightforward: you call Runner.run(agent, input) and the SDK loops — invoking tools, sending results back to the LLM, and repeating until the LLM produces a final output or hits a guardrail. There is no support for cyclic graphs or conditional branching inside the runner; you implement that logic in your own Python code by checking outputs and calling different agents.
The SDK is provider-agnostic in principle — it supports the OpenAI Responses API, Chat Completions API, and over 100 third-party models through the openai Python client. In practice, the best experience is with OpenAI models because tracing, structured outputs, and streaming optimizations are tested and tuned for those endpoints.
Multi-Agent Handoffs are a first-class feature. Instead of building a router system from scratch, you define handoffs directly on the agent. When the LLM determines it needs a specialist, the runner transparently transfers control to the target agent with full conversation history. This makes patterns like customer support triage (greeter -> billing -> technical support) a few lines of code.
Built-in Tracing is enabled by default. Every agent run, tool call, guardrail check, and handoff is recorded and visible in the OpenAI dashboard. You can inspect step-by-step execution, latency breakdowns, token usage, and error paths without setting up any external observability tool. For teams already using OpenAI, this eliminates the need to integrate a third-party tracing solution for agent debugging.
Guardrails run input and output checks in parallel with agent execution. They are defined as Python functions that return a GuardrailResult with a pass/fail status and optional error message. This allows you to block unsafe content, enforce format constraints, or validate business rules before the model response reaches the user.
Streaming is supported through Runner.run_streamed(), which yields intermediate events (tool calls, partial text, handoffs) as they happen. This is critical for real-time user experiences like chat interfaces or live dashboards.
Type Safety comes from Pydantic integration. Tool schemas are inferred from Python type hints, and structured outputs can be enforced using output_type on the agent. The TypeScript SDK mirrors this approach with Zod.
Self-Hostable and Cloud-Hosted: The Python package is installable via pip and runs anywhere Python runs. The tracing backend defaults to OpenAI’s cloud, but you can point it to a self-hosted endpoint if needed. The April 2026 update added sandbox agents that run inside isolated containers for long-running tasks.
Customer-Facing Assistants are the primary use case. Teams deploy GPT-4o agents that handle initial inquiries and hand off to specialist agents (e.g., refunds, technical support) when needed. The built-in tracing gives product teams visibility into failure modes and conversation flow.
Internal Copilots on OpenAI are another common pattern. Companies build agents that answer internal questions about company policies, codebases, or datasets, using guardrails to keep responses on-brand and within compliance boundaries. The SDK’s low overhead makes it easy to spin up a new copilot for each department without heavy scaffolding.
Lightweight Agent Prototypes benefit from the minimal boilerplate. If you need a single agent with a few tools and no multi-agent orchestration, the SDK is often the fastest path to a working prototype compared to frameworks that impose graph or team abstractions.
Code Generation Pipelines: Developers use the SDK to build agents that write code, run it in a sandbox, evaluate the output, and iterate. The sandbox agents feature (April 2026) enables this without exposing the host system.
Poor Fit Cases: Frameworks with deep graph-based reasoning (LangGraph) or role-based crew management (CrewAI) are better for complex, non-linear workflows that require cycles, conditional branching, or human-in-the-loop at every step. OpenAI Agents SDK assumes explicit linear or tree-like delegation.
Install the Python package:
1pip install openai-agents
Set your OpenAI API key as the OPENAI_API_KEY environment variable.
The smallest meaningful example:
1from agents import Agent, Runner23agent = Agent(4 name="Assistant",5 instructions="You are a helpful assistant.",6)78result = Runner.run_sync(agent, "What is the capital of France?")9print(result.final_output)
This creates an agent with no tools, runs it once, and prints the output. To add a tool, define a function with @function_tool and pass it in the tools list.
Prerequisites: An OpenAI API key or an endpoint for a compatible model. No vector store or external observability tool is required for basic use, but tracing requires internet access to OpenAI’s dashboard.
Documentation lives at [openai.github.io/openai-agents-python](https://openai.github.io/openai-agents-python/). The GitHub repository at [github.com/openai/openai-agents-python](https://github.com/openai/openai-agents-python) contains examples and the TypeScript version.
vs LangChain/LangGraph: LangGraph excels at cyclic, state-machine-like workflows with branching and conditional transitions. OpenAI Agents SDK is far simpler for linear or tree-like delegation. If you need a DAG or dynamic routing based on intermediate results, LangGraph gives you more control. If you want minimal setup for a straightforward multi-agent pipeline, OpenAI Agents SDK wins on developer velocity.
vs CrewAI: CrewAI provides role-based teams with built-in task assignments and process definitions. It is opinionated about how agents collaborate (sequential, hierarchical, etc.). OpenAI Agents SDK is more flexible but requires you to write the coordination logic. For teams that want “set up a team and let it figure out the plan,” CrewAI may be easier. For teams that want deterministic, fine-grained control over handoffs and tool selection, OpenAI Agents SDK is the better fit.
vs AutoGen: AutoGen is agent-centric with strong support for inter-agent conversation patterns and human-in-the-loop. It has a larger feature surface but a steeper learning curve. OpenAI Agents SDK is intentionally smaller and faster to learn, but offers less built-in support for conversation management and multi-turn multi-agent dialogue.
The bottom line: Choose OpenAI Agents SDK when your workflow is roughly linear or tree-shaped, you already use OpenAI models, and you want a framework that stays out of your way. Choose alternatives when you need non-linear graph traversal, role-based assignment, or extensive third-party integrations.
What the framework gives you out of the box, in plain language.
First-class multi-agent transfer with preserved context.
Every run shows up in the OpenAI dashboard with full step traces.
Run input and output checks in parallel to block unsafe content.
The jobs this framework is best suited for.
Production agents using GPT models, with handoffs to specialist agents.
Agents that use guardrails to keep outputs on-brand and on-topic.
Minimal boilerplate when you only need a single agent with tools.
Side-by-Side
Add a second or third framework and see stars, downloads, and capabilities lined up next to each other.
Close alternatives worth a look before you decide.
Composable building blocks for LLM apps — chains, agents, retrievers, and integrations.
Composable LLM building blocks
Stars
137.0K
npm / wk
2.2M
PyPI / mo
241.8M
Stateful, graph-based agent workflows with first-class human-in-the-loop.
Complex, stateful agent graphs
Stars
32.3K
npm / wk
—
PyPI / mo
49.0M
Multi-agent crews with role-based prompts and explicit task hand-offs.
Role-based multi-agent crews
Stars
51.6K
npm / wk
—
PyPI / mo
9.6M
Conversational multi-agent simulations and orchestration from Microsoft Research.
Conversational multi-agent simulations
Stars
58.1K
npm / wk
—
PyPI / mo
1.5M