Linq AI

Linq-Embed-Mistral

Top-tier 7B retrieval embedder from Linq AI Research using refined synthetic data and hard-negative mining.

7.1B paramsDense

Our Take

Best for: Open-source embedding text workloads

A workable 7.1B-parameter dense embedding model from Linq AI. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters7.1B

Active Params7B

ArchitectureDense

ProviderLinq AI

Download Size14.3 GB

Community

Monthly Downloads17.7K

Likes142

Last Updated2 years ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

CC-BY-NC-4.0View Full License

Performance & Scoring

Benchmarks

MTEB Overall

68.2

Retrieval

58.7

Classification

62.2

Clustering

50.6

STS

74.9

MBA Open Score

52.8CC

Benchmark60%

62.9

Popularity25%

44.4

Efficiency15%

25.9

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	4.8 GB
Acer Veriton GN100 AI MiniAcer	SS	4.8 GB
AMD Instinct MI300XAMD	SS	4.8 GB
AMD Instinct MI325XAMD	SS	4.8 GB
AMD Instinct MI355XAMD	SS	4.8 GB
AMD Radeon RX 7600 8GBAMD	SS	4.8 GB
AMD Radeon RX 7700 XTAMD	SS	4.8 GB
AMD Radeon RX 7800 XTAMD	SS	4.8 GB
AMD Radeon RX 7900 XTAMD	SS	4.8 GB
AMD Radeon RX 7900 XTXAMD	SS	4.8 GB
AMD Radeon RX 9070AMD	SS	4.8 GB
AMD Radeon RX 9070 XTAMD	SS	4.8 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	4.8 GB
Apple M4Apple	SS	4.8 GB
Apple M4 Max (40-core GPU)Apple	SS	4.8 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	4.8 GB
Apple M5Apple	SS	4.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	4.8 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	4.8 GB
Apple Mac Mini (M1, 2020)Apple	SS	4.8 GB
Apple Mac Mini (M2, 2023)Apple	SS	4.8 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	4.8 GB
Apple Mac Mini (M4, 2024)Apple	SS	4.8 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	4.8 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	4.8 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 5 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 5090Vast.ai · Spot · 32 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · Spot · 24 GB VRAM	$0.13

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Linq-Embed-Mistral is a 7.1B parameter dense text embedding model developed by Linq AI Research, purpose-built for retrieval tasks. Released in May 2024, it achieved a score of 60.2 on MTEB retrieval tasks, placing it first among all models on the leaderboard at launch. This isn't a general-purpose chat model or a code generator—it's a specialized embedder designed to convert text into high-dimensional vectors for semantic search, retrieval-augmented generation (RAG), and document ranking.

The model builds on the Mistral-7B-v0.1 and E5-mistral foundations, but Linq AI's contribution lies in the training methodology: refined synthetic data generation paired with hard-negative mining. The result is a model that consistently outperforms alternatives like BGE-M3 and SFR-Embedding-Mistral on retrieval benchmarks, particularly in distinguishing relevant documents from misleading or superficially similar ones.

At 7.1B parameters, Linq-Embed-Mistral sits in the sweet spot for local deployment—large enough to capture nuanced semantic relationships, small enough to run on consumer hardware with proper quantization. The CC-BY-NC-4.0 license permits research and non-commercial use, which matters for practitioners evaluating it for internal tooling or academic projects.

Architecture & Technical Details

Linq-Embed-Mistral uses a dense transformer architecture with 7.1B parameters. Unlike Mixture-of-Experts (MoE) models that activate only a subset of parameters per forward pass, dense models like this one use all parameters for every computation. This means consistent memory usage regardless of input complexity—7.1B parameters in full precision (FP32) require approximately 28 GB of VRAM, dropping to 7 GB at 4-bit quantization.

The model uses a mean pooling strategy over token embeddings to produce a single fixed-size vector per input text. This is standard for sentence-transformers and compatible with most vector databases and similarity search libraries.

Context length is not specified in the model card, but given the Mistral-7B base, expect 8,192 tokens as the practical limit. This is sufficient for processing documents, code snippets, and most retrieval corpus entries. For longer documents, chunking strategies remain necessary.

The training pipeline is the key differentiator. Linq AI combined existing benchmark datasets with synthetic data generated by larger LLMs, then applied task-specific hard-negative mining. Hard negatives are documents that appear relevant but are actually incorrect—training on these examples forces the model to learn fine-grained distinctions rather than surface-level similarity. This directly translates to better retrieval precision in production systems.

Capabilities & Use Cases

Linq-Embed-Mistral excels at text retrieval—finding the most relevant documents from a corpus given a natural language query. Based on MTEB results, this is where it outperforms every other model in its size class.

Concrete use cases:

RAG pipelines: As the embedding backbone for retrieval-augmented generation. The model's high recall at low rank (94.5% recall@10 on ArguAna) means fewer chunks need to be passed to the generator, reducing latency and token costs.
Semantic search: Replacing keyword-based search in internal documentation, codebases, or knowledge bases. The model handles domain-specific terminology well due to the synthetic data training.
Document deduplication and clustering: With a V-measure of 51.5 on ArxivClusteringP2P, it can group related research papers or support tickets without manual tagging.
Reranking: The 79.6 MRR on AskUbuntuDupQuestions shows strong ability to reorder candidate results, useful as a second-pass ranker after a cheaper initial retrieval.

The model is text-only and English-focused based on available benchmarks. It is not designed for multimodal tasks, code generation, or conversational AI.

Running Linq-Embed-Mistral Locally

Local deployment is practical on mid-range to high-end consumer hardware. Here's what you need:

Minimum VRAM requirements by quantization:

Quantization	VRAM Required	Quality Impact
Q4_K_M	~7 GB	Minimal degradation, recommended default
Q5_K_M	~9 GB	Near-lossless, for critical accuracy
Q8_0	~13 GB	Virtually lossless
FP16	~14 GB	Full precision, only on high-VRAM cards

Recommended hardware:

Consumer GPU: RTX 4090 (24 GB) runs Q5_K_M or FP16 comfortably. RTX 3090 (24 GB) same capability. RTX 4070 Ti (12 GB) handles Q4_K_M with room to spare.
Apple Silicon: M4 Max with 48 GB unified memory runs FP16 without issues. M3 Pro with 18 GB can run Q4_K_M.
CPU-only: Not recommended for production. Expect 1–3 tokens/second on modern CPUs at Q4_K_M.

Expected performance:

On an RTX 4090 with Q4_K_M quantization, expect:

Embedding generation: 200–400 tokens/second (batch size 1)
Throughput: 100–200 documents/second for short texts (batch size 64)

On an M4 Max (48 GB):

Embedding generation: 100–200 tokens/second
Throughput scales well with batch size

Quick start with Ollama:

1ollama pull linq-embed-mistral

Then use it via the Ollama API or integrate with sentence-transformers:

1from sentence_transformers import SentenceTransformer
2
3model = SentenceTransformer("Linq-AI-Research/Linq-Embed-Mistral")
4embeddings = model.encode(["Your text here"])

For production pipelines, consider ONNX or TensorRT export for lower latency. The model is also available in transformers and sentence-transformers directly from HuggingFace.

How It Compares

vs. BGE-M3 (BAAI)

BGE-M3 is a 567M parameter multilingual embedder with support for dense, sparse, and ColBERT-style retrieval. It's significantly smaller than Linq-Embed-Mistral, meaning it runs faster and on less hardware (Q4_K_M fits in 2 GB). However, Linq-Embed-Mistral outperforms BGE-M3 on English retrieval tasks by a meaningful margin (60.2 vs. ~55.0 on MTEB retrieval). Choose BGE-M3 if you need multilingual support or ultra-low latency; choose Linq-Embed-Mistral if English retrieval accuracy is your priority and you have the VRAM.

vs. SFR-Embedding-Mistral (Salesforce)

SFR-Embedding-Mistral is the direct predecessor, also based on Mistral-7B. Linq-Embed-Mistral improved from SFR's 59.0 to 60.2 on MTEB retrieval through better data curation and hard-negative mining. The practical difference is visible in edge cases—documents that look relevant but aren't. If you're already running SFR, upgrading to Linq-Embed-Mistral requires no hardware change and yields measurable gains. If you're starting fresh, Linq-Embed-Mistral is the better choice.

Tradeoffs:

Linq-Embed-Mistral is larger than many embedders, requiring more VRAM and compute.
It's English-only. For multilingual retrieval, look at BGE-M3 or multilingual-e5.
The CC-BY-NC-4.0 license prohibits commercial use without permission. Verify your use case before deploying in production.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Linq AI

Linq-Embed-Mistral

Top-tier 7B retrieval embedder from Linq AI Research using refined synthetic data and hard-negative mining.

7.1B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source embedding text workloads

A workable 7.1B-parameter dense embedding model from Linq AI. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters7.1B

Active Params7B

ArchitectureDense

ProviderLinq AI

Download Size14.3 GB

Community

Monthly Downloads17.7K

Likes142

Last Updated2 years ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

CC-BY-NC-4.0View Full License

Performance & Scoring

Benchmarks

MTEB Overall

68.2

Retrieval

58.7

Classification

62.2

Clustering

50.6

STS

74.9

MBA Open Score

52.8CC

Benchmark60%

62.9

Popularity25%

44.4

Efficiency15%

25.9

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	4.8 GB
Acer Veriton GN100 AI MiniAcer	SS	4.8 GB
AMD Instinct MI300XAMD	SS	4.8 GB
AMD Instinct MI325XAMD	SS	4.8 GB
AMD Instinct MI355XAMD	SS	4.8 GB
AMD Radeon RX 7600 8GBAMD	SS	4.8 GB
AMD Radeon RX 7700 XTAMD	SS	4.8 GB
AMD Radeon RX 7800 XTAMD	SS	4.8 GB
AMD Radeon RX 7900 XTAMD	SS	4.8 GB
AMD Radeon RX 7900 XTXAMD	SS	4.8 GB
AMD Radeon RX 9070AMD	SS	4.8 GB
AMD Radeon RX 9070 XTAMD	SS	4.8 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	4.8 GB
Apple M4Apple	SS	4.8 GB
Apple M4 Max (40-core GPU)Apple	SS	4.8 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	4.8 GB
Apple M5Apple	SS	4.8 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	4.8 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	4.8 GB
Apple Mac Mini (M1, 2020)Apple	SS	4.8 GB
Apple Mac Mini (M2, 2023)Apple	SS	4.8 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	4.8 GB
Apple Mac Mini (M4, 2024)Apple	SS	4.8 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	4.8 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	4.8 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 5 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 5090Vast.ai · Spot · 32 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · Spot · 24 GB VRAM	$0.13

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

Capabilities & Use Cases

Concrete use cases:

RAG pipelines: As the embedding backbone for retrieval-augmented generation. The model's high recall at low rank (94.5% recall@10 on ArguAna) means fewer chunks need to be passed to the generator, reducing latency and token costs.
Semantic search: Replacing keyword-based search in internal documentation, codebases, or knowledge bases. The model handles domain-specific terminology well due to the synthetic data training.
Document deduplication and clustering: With a V-measure of 51.5 on ArxivClusteringP2P, it can group related research papers or support tickets without manual tagging.
Reranking: The 79.6 MRR on AskUbuntuDupQuestions shows strong ability to reorder candidate results, useful as a second-pass ranker after a cheaper initial retrieval.

The model is text-only and English-focused based on available benchmarks. It is not designed for multimodal tasks, code generation, or conversational AI.

Running Linq-Embed-Mistral Locally

Local deployment is practical on mid-range to high-end consumer hardware. Here's what you need:

Minimum VRAM requirements by quantization:

Quantization	VRAM Required	Quality Impact
Q4_K_M	~7 GB	Minimal degradation, recommended default
Q5_K_M	~9 GB	Near-lossless, for critical accuracy
Q8_0	~13 GB	Virtually lossless
FP16	~14 GB	Full precision, only on high-VRAM cards

Recommended hardware:

Consumer GPU: RTX 4090 (24 GB) runs Q5_K_M or FP16 comfortably. RTX 3090 (24 GB) same capability. RTX 4070 Ti (12 GB) handles Q4_K_M with room to spare.
Apple Silicon: M4 Max with 48 GB unified memory runs FP16 without issues. M3 Pro with 18 GB can run Q4_K_M.
CPU-only: Not recommended for production. Expect 1–3 tokens/second on modern CPUs at Q4_K_M.

Expected performance:

On an RTX 4090 with Q4_K_M quantization, expect:

Embedding generation: 200–400 tokens/second (batch size 1)
Throughput: 100–200 documents/second for short texts (batch size 64)

On an M4 Max (48 GB):

Embedding generation: 100–200 tokens/second
Throughput scales well with batch size

Quick start with Ollama:

1ollama pull linq-embed-mistral

Then use it via the Ollama API or integrate with sentence-transformers:

1from sentence_transformers import SentenceTransformer
2
3model = SentenceTransformer("Linq-AI-Research/Linq-Embed-Mistral")
4embeddings = model.encode(["Your text here"])

For production pipelines, consider ONNX or TensorRT export for lower latency. The model is also available in transformers and sentence-transformers directly from HuggingFace.

How It Compares

vs. BGE-M3 (BAAI)

vs. SFR-Embedding-Mistral (Salesforce)

Tradeoffs:

Linq-Embed-Mistral is larger than many embedders, requiring more VRAM and compute.
It's English-only. For multilingual retrieval, look at BGE-M3 or multilingual-e5.
The CC-BY-NC-4.0 license prohibits commercial use without permission. Verify your use case before deploying in production.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.