Mastra AI
TypeScript-first agent framework with workflows, RAG, and built-in evals.
GitHub Stars
24.0K
Contributors
441
npm / Week
961.9K
PyPI / Month
—
Mastra is a TypeScript-native agent framework built and maintained by Mastra AI, the team behind Gatsby. It is licensed under Apache 2.0 and was first released in 2024. The framework occupies the intersection of orchestration, agent runtime, and workflow categories, competing directly with LangChain, CrewAI, and LangGraph for teams that want to build AI agents without leaving the JavaScript ecosystem.
The core problem Mastra solves is this: Python has dominated agent development for years, but the majority of modern SaaS applications are built in TypeScript. Mastra gives TypeScript teams first-class primitives for agents, workflows, and retrieval-augmented generation (RAG) without requiring a separate Python service, awkward bridge APIs, or dual-language maintenance. It is opinionated by design, bundling agents, evals, tracing, and a local development studio into a single @mastra/core package.
With nearly 24,000 GitHub stars, 441 contributors, and over 960,000 weekly npm downloads, Mastra has quickly become one of the most adopted TypeScript agent frameworks. Its design philosophy is pragmatic: provide enough structure to prevent common failure modes (loose typing, unmonitored agent loops, opaque execution) while keeping the API surface small enough that a single developer can ship a production agent in a few hours.
Mastra is code-first and TypeScript-first. You build agents by importing classes from @mastra/core and composing them with plain functions and Zod schemas. The framework is neither fully declarative nor purely imperative: it uses a step-based workflow model that lets you define control flow explicitly while still leveraging agent reasoning where you need it.
The core abstractions are:
.step(), branch with .condition(), parallelize with .parallel(), and add retry logic, human approval gates, and timeout behaviors. Workflows can call agents at any step, but they also support pure deterministic code steps. This hybrid model gives you explicit control over execution order without losing the flexibility of agentic reasoning.Mastra uses a local development server (npx bgproc start) that exposes a web UI called Mastra Studio. Studio lets you test agents, inspect traces, swap models, and adjust parameters in real time. In production, you deploy Mastra as a standalone server, embed it in a Next.js or Express app, or deploy to edge runtimes like Cloudflare Workers.
Every tool input, tool output, and structured response is validated with Zod. This is not optional: if you define a tool schema, the framework enforces it at runtime. For production systems where incorrect outputs can cause downstream failures, this is a significant safety net. Mastra also supports structured output from the LLM itself, letting you emit typed objects instead of raw text.
Workflows in Mastra are persistent by default. State is saved between steps. If a workflow suspends (waiting for human approval, for example), it can be resumed later without losing context. You can also rewind and replay any step for debugging. Retries, exponential backoff, and branching are first-class concepts, not afterthoughts.
Evals are part of the framework, not an add-on. You write them alongside your agents using the same TypeScript codebase. They can run locally during development, in CI pipelines, or against production traces. Mastra provides model-graded eval helpers (e.g., “is this answer faithful to the source?”), rule-based checks (e.g., “does the output contain PII?”), and statistical methods. This tight integration means you can validate answer quality against a golden dataset before every deployment.
For complex workloads, Mastra supports supervisor agents that delegate tasks to specialized sub-agents. The coordination model is explicit: the supervisor receives a goal, breaks it into subtasks, assigns each to a child agent, and aggregates results. This is more structured than emergent multi-agent chat loops, which is beneficial for deterministic, auditable processes.
Streaming is built into the agent runtime. You can stream tokens, tool calls, and intermediate reasoning to a frontend or API consumer. This is important for any interactive application where latency matters, such as chat interfaces or real-time copilots.
OpenTelemetry is baked in. Mastra automatically traces agent calls, tool executions, LLM requests, and workflow steps. You can export traces to any OpenTelemetry-compatible backend (Datadog, Grafana, SigNoz, etc.) or use Mastra’s own observability platform. This makes debugging non-deterministic agent behavior tractable.
Workflows can suspend execution at any step and require a human to approve or reject a tool call before it executes. This is configured per step. The framework provides a webhook and API for external approval systems.
The framework itself is fully open source and self-hostable. Mastra AI also offers a managed cloud platform (Mastra Cloud) for teams that want hosted deployments, observability, and scaling without managing infrastructure.
The most common deployment pattern: a Next.js or Express application that needs AI capabilities such as a support chat, an internal copilot, or a content generator. Mastra integrates as a library within the existing app. You define agents and workflows in the same repository, share types with the rest of the codebase, and deploy via the same CI pipeline. This avoids the operational overhead of running a separate Python microservice.
Teams building Q&A systems over internal documents often struggle with answer quality. Mastra’s built-in evals allow you to define a golden set of question-answer pairs and score each release. If a new model or chunking strategy decreases accuracy on the golden set, the eval catches it before production.
Long-running agent jobs that require retries, branching, and human checkpoints. Examples include automated customer refund workflows, content moderation pipelines, and multi-step data enrichment. Mastra’s workflow engine persists state, so even if the server restarts, a workflow can resume from its last completed step.
Supervisor agents that break a feature request into subtasks (UI, API, database), delegate each to a specialized coding agent, and then integrate the results. Mastra’s structured supervision model keeps this process deterministic enough to audit.
For a single-file agent that calls one LLM with no tools and no persistence, Mastra’s opinionated structure adds unnecessary boilerplate. In those cases, a direct API call or a lighter library like Vercel AI SDK is more appropriate.
The quickest path to a working agent:
1npm create mastra@latest my-mastra-app2cd my-mastra-app3npx bgproc start -n my-mastra-app -w -- npm run dev
This creates a new project with the default template and starts the local dev server. Open http://localhost:4111 to access Mastra Studio.
The first meaningful agent looks something like this:
1import { Agent } from "@mastra/core/agent";2import { OpenAI } from "@mastra/core/llm/openai";34const agent = new Agent({5 name: "research-agent",6 instructions: "You are a helpful research assistant. Answer questions concisely.",7 model: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),8 tools: [webSearchTool],9});1011const response = await agent.generate("What is the current state of quantum computing?");12console.log(response.text);
You need an LLM provider key (OpenAI, Anthropic, etc.) and optionally a vector store key if you use RAG. The framework supports over 40 providers and 3,000+ models through its model router.
Documentation lives at [mastra.ai/docs](https://mastra.ai/docs). The community is active on Discord and GitHub. There are also pre-built templates for common patterns like browser agents, Google Sheet analysis, and database chat.
LangChain is older, has a larger ecosystem, and supports many languages. However, its Python roots mean TypeScript support has historically been a second-class citizen. Mastra is TypeScript-native: every API surface is typed with Zod from day one. For teams that write TypeScript and want to avoid dual-language codebases, Mastra provides a more cohesive experience. LangChain offers more integrations (hosting providers, vector stores, etc.), but Mastra’s ecosystem is growing quickly. Choose LangChain if you need maximum breadth or are already invested in its tooling. Choose Mastra if you value type safety and a leaner, opinionated API.
CrewAI specializes in role-based multi-agent teams. It is also Python-first, with TypeScript support lagging behind. Mastra’s multi-agent model is supervisor-based rather than role-based, which gives more explicit control over delegation logic. If you need emergent, self-organizing agent swarms, CrewAI may be a better fit. If you need deterministic, auditable multi-step workflows with built-in evals, Mastra is stronger.
Vercel AI SDK is a lighter library focused on streaming and UI integration. It does not provide workflows, evals, or agent memory. Mastra sits above it in abstraction level: you use Mastra when you need structured orchestration and evaluation, not just a thin LLM client. For pure streaming chatbot examples, the AI SDK is sufficient. For production agents that must be monitored, tested, and debugged, Mastra wins.
What the framework gives you out of the box, in plain language.
Tool inputs, outputs, and structured responses are validated with Zod.
Step-based workflows with retries, branching, and human approval gates.
Run scored evaluations against agents and workflows from the same codebase.
The jobs this framework is best suited for.
Add agents and workflows to a Next.js or Node app without leaving the TypeScript stack.
Build a retrieval pipeline and continuously evaluate answer quality against a golden set.
Long-running agent jobs with retries, branching, and human checkpoints.
Side-by-Side
Add a second or third framework and see stars, downloads, and capabilities lined up next to each other.
Close alternatives worth a look before you decide.
Composable building blocks for LLM apps — chains, agents, retrievers, and integrations.
Composable LLM building blocks
Stars
137.0K
npm / wk
2.2M
PyPI / mo
241.8M
Multi-agent crews with role-based prompts and explicit task hand-offs.
Role-based multi-agent crews
Stars
51.6K
npm / wk
—
PyPI / mo
9.6M