Foundational 7B LLM-based embedder fine-tuned with GPT-4-synthesized instruction data.
Access model weights, configuration files, and documentation.
See which devices can run this model and at what quality level.
A 7B-parameter embedding model from intfloat (Microsoft Research) derived from Mistral-7B-v0.1 via LoRA fine-tuning, introduced in "Improving Text Embeddings with Large Language Models" (Wang et al., 2024). It pioneered the use of GPT-4-generated synthetic instruction data across hundreds of embedding tasks and 93 languages, set a then-SOTA on MTEB, and serves as the base for many follow-ups including SFR-Embedding-Mistral, Linq-Embed-Mistral, and GritLM-7B.