InternLM
Compress, deploy, and serve open models with high throughput.
GitHub Stars
7.9K
Contributors
140
PyPI / Month
52.8K
What the engine gives you out of the box, in plain language.
A fast inference backend tuned for throughput on NVIDIA GPUs.
Shrink models with weight and KV cache quantization to save memory.
Serves a familiar API so existing clients connect without changes.
The jobs this engine is best suited for.
Serve an open model to a busy app with low cost per token.
Use quantization to run a bigger model on a smaller card.
Point existing OpenAI clients at your own GPU box.

Side-by-Side
Add a second or third engine and see stars, downloads, and capabilities lined up next to each other.