Unsloth AI
A no-code local web app for fine-tuning and running open models on your own hardware.
GitHub Stars
—
Contributors
—
Release Downloads
—
Latest Version
—
The engines this app runs on and the models it ships with, linked into the rest of the research stack.
Unsloth Studio is an open-source, no-code desktop app that lets you train, run, and export open models entirely on your own hardware. It is built and maintained by Unsloth AI, the team behind the popular Unsloth fine-tuning library. What sets it apart from a typical local chat GUI is that it is first and foremost a fine-tuning tool. You do not need to write training scripts, configure loss functions, or debug CUDA errors. The app brings Unsloth’s performance optimizations — 2x faster training with up to 70% lower VRAM usage — into a point-and-click interface.
The app sits at the intersection of several categories: a fine-tuning UI, a local chat client, and an OpenAI-compatible API server. It competes with tools like LM Studio and Ollama for local inference, but no other desktop app offers no-code fine-tuning as a core feature. It also overlaps with Open WebUI and AnythingLLM for chat and document interaction, but those do not expose training capabilities.
Unsloth Studio launched in March 2026 and quickly gained traction on Hacker News and GitHub (the parent Unsloth repository has over 67,000 stars). The team maintains a responsive Discord community and publishes regular updates on their changelog. The app is still in beta, so you should expect rough edges and breaking changes.
What the app gives you out of the box, in plain language.
The jobs this app is best suited for.
Teach an open model your domain, tone, or task without writing training code.
Run open models offline and point tools like Claude Code or Codex at the local endpoint.
Test models side by side in the Model Arena before you commit to one.
Free and open source. Paid Pro and Enterprise speed tiers.

Side-by-Side
Add a second or third app and see stars, downloads, platforms, and capabilities lined up next to each other.
Close alternatives worth a look before you decide.
You can search for, download, and run open models directly from the app. It supports both GGUF (the format used by llama.cpp) and safetensors. The underlying inference engine is llama.cpp with Hugging Face integration, and it handles multi-GPU inference, automatic memory offloading, and model fitting. You can run text, vision, audio, and embedding models.
This is the core differentiator. You select a base model from a supported list (over 500 models including Gemma, Qwen, DeepSeek, and others), upload your data, and hit train. The app uses Unsloth’s custom CUDA kernels to optimize LoRA, FP8, full fine-tuning, and QAT. Training happens with real-time observability — you can watch loss curves and validation metrics in the UI.
After training or downloading, you can chat with the model in a local web UI. The chat interface supports self-healing tool calling (the model can recover from malformed tool calls), unlimited web search, and code execution in Bash and Python. Results from web search are surfaced inside the model’s thinking trace. The code execution environment is sandboxed like Claude Artifacts, so the model can test code, generate files, and verify answers with real computation.
You can export any model — including your fine-tuned ones — to GGUF or 16-bit safetensors. These can then be used with llama.cpp, vLLM, Ollama, or any other inference engine. The app also exposes an OpenAI-compatible API endpoint, meaning you can point tools like Claude Code, Codex, or any custom agent to your local model.
The Model Arena lets you load two models and compare their responses to the same prompt. This is useful for evaluating candidate models before committing to one for a specific task.
Platforms: macOS, Windows (native, no WSL required), and Linux.
Pricing: The app is free and open source for all features. Paid Pro and Enterprise tiers exist for higher speed limits on the API endpoint (e.g., rate limiting, priority inference) but the local training and inference capabilities are not gated.
Hardware requirements:
llama.cpp (GGUF) for inference, but training is not available without a GPU.Installation: Not a one-click store download. You install via a script (curl -fsSL https://unsloth.ai/install.sh | sh on macOS/Linux, or the PowerShell equivalent on Windows). It launches a local web server (default port 8000) that you open in your browser. The installation pulls Python dependencies, Unsloth’s library, and the UI.
License: Other (the main Unsloth library is available under a permissive license; the Studio UI is also open source via the GitHub repository).
Upload PDFs, CSVs, JSON, DOCX, or TXT files. The Data Recipes feature automatically converts these into training datasets via a graph-node workflow — no manual preprocessing. You can then train with LoRA, FP8, full-finetuning, or prefix tuning. The UI shows real-time training metrics. This is the only desktop app that offers this capability out of the box.
Run any downloaded model in a chat interface with autonomous tool calling, web search, and code execution. The model can call Bash and Python, not just JavaScript, making it usable for programming tasks. The API endpoint is OpenAI-compatible, so you can replace a cloud model in your existing toolchain with a local one.
Turns unstructured documents into labeled datasets. For example, upload a PDF of internal documentation and the app can generate question-answer pairs or instruction-following examples for fine-tuning. Manual prep is not required.
Export to GGUF for use with Ollama or llama.cpp, or to safetensors for vLLM. This means your fine-tuned model is not locked into Unsloth’s ecosystem.
Fine-tune on your own data. A startup training a customer support model on internal transcripts can do it entirely offline on a single RTX 4090, with no cloud costs or data-leakage risk. The Data Recipes feature reduces the time from raw documents to usable dataset to hours.
Private local model server. A developer using Claude Code or Codex for code generation can point those tools at Unsloth Studio’s local API endpoint. The model (e.g., Qwen 3.5-4B) runs offline, with tool calling and web search enabled. No data ever leaves the machine.
Model evaluation. Before committing to a base model for fine-tuning, a practitioner can compare Gemma 4, Qwen 3.6, and DeepSeek in the Model Arena using the same prompts and see side-by-side responses.
Who should look elsewhere: If you need one-click install from the Mac App Store, want a hosted cloud solution, or need AMD training support, Unsloth Studio is not ready for you yet. If you only need inference (no training), LM Studio or Ollama offer simpler installation and more polished chat UIs.
curl -fsSL https://unsloth.ai/install.sh | shirm https://unsloth.ai/install.ps1 | iexunsloth studio (or the appropriate binary). The app starts a local web server and opens your browser.Qwen3.5-4B) and click download. GGUF files are typically a few GB.http://localhost:8000/v1.For documentation, visit [unsloth.ai/docs/new/studio](https://unsloth.ai/docs/new/studio). For community support, join the Unsloth Discord (linked on the GitHub repo).
Unsloth Studio vs. LM Studio / Ollama. Both are excellent for local inference: they are easy to install, have polished GUIs, and support a wide range of models. However, neither offers fine-tuning. If your only goal is to run models locally, start with LM Studio or Ollama. If you need to train or adapt a model, Unsloth Studio is the only option that does both in one interface.
Unsloth Studio vs. Open WebUI / AnythingLLM. Those tools focus on chat, RAG, and document integration. They do not support training. Unsloth Studio’s chat interface is less full-featured for document Q&A, but it includes tool calling and code execution. Choose Unsloth Studio if you need fine-tuning; choose Open WebUI if you need a robust chat-with-documents pipeline.
Unsloth Studio vs. cloud fine-tuning platforms (e.g., Replicate, Hugging Face AutoTrain). Cloud platforms handle infrastructure but charge per hour of GPU usage and require internet. Unsloth Studio runs fully offline with no per-token costs, but you need your own GPU. For one-off experiments with free Colab credits, the cloud might be easier. For repeated training on sensitive data, local is the better bet.
In summary, Unsloth Studio is the first desktop app that makes local fine-tuning practical for non-Python developers. It is not a polished consumer app yet, but it fills a genuine gap for teams that want to train models on their own terms.
Train and adapt open models through a UI, with much lower GPU memory use than the standard stack.
Run your models locally with tool calling, web search, and an OpenAI-compatible API.
Turn PDFs, CSVs, and other files into training datasets without manual prep.