NVIDIA
NVIDIA's engine for the fastest inference on NVIDIA GPUs.
GitHub Stars
14.0K
Contributors
411
PyPI / Month
11.1K
What the engine gives you out of the box, in plain language.
Turns a model into a hardware-tuned engine for faster runs on NVIDIA GPUs.
Adds and removes requests from the batch on the fly to keep the GPU busy.
trtllm-serve exposes a local endpoint that mirrors the OpenAI API.
The jobs this engine is best suited for.
Serve a model where every millisecond of response time matters.
Get the most throughput per card on NVIDIA hardware.
Split a big model across several NVIDIA GPUs.

Side-by-Side
Add a second or third engine and see stars, downloads, and capabilities lined up next to each other.