ggml.org
Run models almost anywhere, from a laptop CPU to a server GPU.
GitHub Stars
118.4K
Contributors
1.8K
PyPI / Month
—
What the engine gives you out of the box, in plain language.
One engine for CPU, CUDA, ROCm, Vulkan, and Apple Metal backends.
Compact model files that fit large models into modest memory.
A built-in server with an OpenAI-compatible API and grammar-constrained output.
The jobs this engine is best suited for.
Run a model on a machine with no GPU at all.
Fit models onto small or constrained devices with quantization.
Build a custom runner on the same engine that powers many local apps.

Side-by-Side
Add a second or third engine and see stars, downloads, and capabilities lined up next to each other.