
239M multilingual embedder distilled from Qwen3-Embedding-4B onto an EuroBERT backbone.
A strong 0.212B-parameter dense embedding model from Jina AI. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.
Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.
| Option | Cost / GPU-hour |
|---|---|
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM | $0.11 |
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM | $0.13 |
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM | $0.13 |
NVIDIA GeForce RTX 5090Vast.ai · Spot · 32 GB VRAM | $0.13 |
NVIDIA GeForce RTX 4090Vast.ai · Spot · 24 GB VRAM | $0.13 |
Per-GPU rate across RunPod and the Vast.ai marketplace.
Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.
jina-embeddings-v5-text-nano is a 239M parameter multilingual text embedding model from Jina AI, released February 18, 2026. It is the smallest member of the jina-embeddings-v5 family, distilled from the much larger Qwen3-Embedding-4B onto a EuroBERT-210M backbone. Despite its compact size, it scores 71.0 on MTEB English v2 and 65.5 on MMTEB—matching or exceeding models more than twice its parameter count, such as KaLM-mini-v2.5 (494M) and Gemma-300M (308M).
This model is purpose-built for retrieval, text matching, clustering, and classification tasks where latency and memory are constrained. It replaces the need for cloud API calls by running efficiently on consumer hardware, making it a strong choice for local RAG pipelines, semantic search, and multilingual document processing on edge devices. The license is CC-BY-NC-4.0, which restricts commercial use but permits research, prototyping, and internal development.
jina-embeddings-v5-text-nano uses a dense architecture with 0.212B total parameters (239M). It is built on EuroBERT-210M, a bidirectional encoder pretrained on 15 major European and global languages. The model produces 768-dimensional embeddings via last-token pooling and supports a maximum sequence length of 8192 tokens.
Key architectural features:
jina-embeddings-v5-text-nano is a text-only embedding model that handles 108 languages, with primary training on 32 languages. It is designed for four core tasks:
Concrete use cases include local RAG on a laptop for research papers, multilingual semantic search on customer support archives, and topic clustering for social media monitoring. Because the model is small, it can run alongside larger LLMs in a pipeline without consuming excessive VRAM. Note that it does not process images or audio—strictly text.
This model is built for local deployment on consumer-grade hardware. Its small size makes it one of the few embedding models that fits comfortably on integrated GPUs and even some CPUs with acceptable throughput.
| Quantization | Approximate VRAM | Notes |
|---|---|---|
| FP16 (default) | ~500 MB | Full precision, best quality |
| Q8_0 | ~270 MB | Minimal quality loss |
| Q4_K_M | ~150 MB | Recommended for most users |
| Q2_K | ~90 MB | Aggressive, suitable for memory-constrained devices |
For most local deployments, use Q4_K_M (GGUF) or the equivalent MLX 4-bit quant. This preserves near-FP16 accuracy while cutting VRAM usage by over 60%. If you need maximum throughput and have ample VRAM, FP16 is safe. Binary quantization (1-bit) is supported but only recommended for retrieval tasks where speed is critical and recall is secondary.
On an RTX 4090, expect >1000 tokens/sec for a single sequence, scaling linearly with batch size. On an M4 Pro, ~600 tokens/sec. On a desktop CPU (AVX2 support), ~80 tokens/sec with Q4_K_M.
The fastest way to evaluate the model locally is via Ollama (if supported) or directly through transformers with peft:
1from transformers import AutoModel2model = AutoModel.from_pretrained("jinaai/jina-embeddings-v5-text-nano", trust_remote_code=True)
For Elasticsearch users, the model is also available through Elastic Inference Service (EIS), which provides managed inference within your cluster.
jina-embeddings-v5-text-nano scores higher on MTEB (71.0 vs. ~69) despite having half the parameters.sentence-transformers and GGUF.If you need commercial licensing or require a larger context window (32K tokens), consider jina-embeddings-v5-text-small (677M) or the Apache-licensed intfloat/multilingual-e5-small.

Explore the Provider
Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Jina AI model we track.

Explore the Family
The full Jina Embeddings family leaderboard with sizes, benchmark scores, and a release timeline.