BidirLM

BidirLM-1.7B-Embedding

A 1.7B bidirectional encoder distilled from causal Qwen3-1.7B via masking + contrastive adaptation.

1.7B paramsDense

Our Take

Best for: Open-source embedding text workloads

A workable 1.7B-parameter dense embedding model from BidirLM. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1.7B

Active Params1.4B

ArchitectureDense

ProviderBidirLM

Download Size3.5 GB

Community

Monthly Downloads346

Likes6

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Retrieval

59.9

Classification

65.9

Clustering

51.5

STS

74.2

MBA Open Score

48.6CC

Benchmark60%

62.9

Popularity25%

5.6

Efficiency15%

63.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	1.4 GB
Acer Veriton GN100 AI MiniAcer	SS	1.4 GB
AMD Instinct MI300XAMD	SS	1.4 GB
AMD Instinct MI325XAMD	SS	1.4 GB
AMD Instinct MI355XAMD	SS	1.4 GB
AMD Radeon RX 7600 8GBAMD	SS	1.4 GB
AMD Radeon RX 7700 XTAMD	SS	1.4 GB
AMD Radeon RX 7800 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTXAMD	SS	1.4 GB
AMD Radeon RX 9070AMD	SS	1.4 GB
AMD Radeon RX 9070 XTAMD	SS	1.4 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.4 GB
Apple M4Apple	SS	1.4 GB
Apple M4 Max (40-core GPU)Apple	SS	1.4 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple M5Apple	SS	1.4 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.4 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.4 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.4 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.4 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.4 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 3080Vast.ai · Spot · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 3080Vast.ai · On-Demand · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.07
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.08
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.09

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

BidirLM-1.7B-Embedding is a 1.7 billion parameter bidirectional encoder designed for text representation. It is the result of a focused distillation process: the team at BidirLM took a causal decoder (Qwen3-1.7B) and transformed it into an efficient encoder using a two-stage pipeline – masked next-token prediction (MNTP) followed by contrastive adaptation. The outcome is a dense, text-only model that achieves a mean MTEB Multilingual V2 score of 62.9, competitive with open-source embedding models at twice its size.

This model fits into the growing category of “encoder-only” systems derived from large language models, offering a practical alternative to classic BERT-style architectures. Unlike many embedding models that rely solely on contrastive learning, BidirLM-1.7B-Embedding first undergoes a masking phase that preserves the model’s ability to be fine-tuned on downstream tasks such as NER, classification, and NLI. For developers who need a single, multilingual embedding model that also works as a backbone for task-specific fine-tuning, this is a strong candidate.

Architecture & Technical Details

BidirLM-1.7B-Embedding is a dense model with 1.7 billion parameters – all weights are active during every forward pass. This contrasts with mixture-of-experts (MoE) models, where only a subset of parameters are used per token, often leading to lower VRAM requirements for inference but higher memory for fine-tuning. A dense 1.7B model strikes a balance: it is small enough to run on consumer hardware yet large enough to capture nuanced multilingual representations.

The architecture is built on the Qwen3 transformer with an embedding dimension of 2048 and a maximum token limit of 512 for MTEB evaluation. However, the underlying Qwen3 backbone supports up to 32,768 tokens in principle. You can increase model.max_seq_length in Sentence Transformers or adjust max_length in the tokenizer to handle longer documents – though doing so will increase VRAM usage and may degrade retrieval performance if the model was not trained on longer sequences.

The model uses a custom trust_remote_code=True flag when loading via Hugging Face’s transformers or sentence-transformers libraries. It outputs vectors of size 2048, which is on par with other modern embedding models like BGE-M3 (1024) and multilingual-e5 (1024). The larger dimension can improve retrieval fidelity but also increases storage and memory for index vectors.

Capabilities & Use Cases

BidirLM-1.7B-Embedding is primarily a sentence and document embedding model – it converts text into dense vectors that can be compared for similarity, clustering, classification, and retrieval. It supports the full suite of MTEB tasks: semantic textual similarity (STS), clustering, pair classification, reranking, bitext mining, and multi-label classification. Additionally, it can be fine-tuned for sequence classification (e.g., NLI, sentiment) and token classification (e.g., NER) via the standard transformers API.

Key strengths:

Multilingual: Inheriting Qwen3’s vocabulary, the model covers 119 languages with robust performance on MTEB Multilingual V2. It was further reinforced with contrastive data from 87 languages, making it suitable for cross-lingual retrieval and zero-shot transfer.
Fine-tuning versatility: Thanks to the MNTP pre-training, it retains the ability to be fine-tuned on downstream tasks without catastrophic forgetting – something pure contrastive models often struggle with.
Open license: Apache 2.0 allows unrestricted use in commercial products, research, and fine-tuning.

Concrete use cases:

Semantic search: Embed queries and documents to power local search engines or RAG pipelines.
Zero-shot classification: Use cosine similarity between class descriptions and input text to classify without any training data.
Cross-lingual retrieval: Search a multilingual corpus using an English query.
NER/NLI fine-tuning: Adapt the model with a small amount of labeled data for specific domains (e.g., legal, medical).

Running BidirLM-1.7B-Embedding Locally

This model is well-suited to local deployment. At 1.7B parameters and full (float32) precision, it requires roughly 6.8 GB of GPU memory for the weights alone. Most practitioners will use a quantized version to reduce memory and improve throughput.

VRAM requirements by quantization:

Precision	VRAM (approx)	Use case
FP16	~3.4 GB	Recommended for maximum accuracy on RTX 4090, A-series, or M-series with >8 GB
Q4_K_M (4-bit)	~1.1 GB	Works on 6–8 GB GPUs; minimal accuracy loss for embeddings
Q8_0 (8-bit)	~1.7 GB	Good balance for lower-end GPUs

Consumer hardware that can run it:

RTX 4090 (24 GB): Q4_K_M or FP16 with ease; batch encoding of hundreds of sentences per second.
RTX 3060 12 GB: Q4_K_M comfortably; Q8_0 also works. Expect 50–100 tokens/sec depending on sequence length.
M4 Max (64 GB unified memory): FP16 or Q8_0 with high throughput.
Apple M1/M2 (8–16 GB): Q4_K_M only; around 20–40 tokens/sec.

Recommended quantization: For most users, Q4_K_M offers the best trade-off of memory, speed, and quality. The 4-bit quantized model loads in about 1.1 GB, leaving room for tokenizer, cache, and running multiple instances.

Expected tokens per second: On a single RTX 4090 with sequence length 512 and batch size 1, you can expect roughly 120–180 tokens/sec in FP16. With Q4_K_M, this improves to 200–250 tokens/sec. For longer sequences (e.g., 2048 tokens), throughput drops to 40–60 tokens/sec.

Quickstart with Ollama: Although not officially packaged, you can convert the model to GGUF format and run it via ollama. Use a tool like llama.cpp to quantize and create a Modelfile. Alternatively, the fastest path is with Sentence Transformers:

1from sentence_transformers import SentenceTransformer
2model = SentenceTransformer("BidirLM/BidirLM-1.7B-Embedding", trust_remote_code=True)
3embeddings = model.encode(["Hello, world"])

How It Compares

BidirLM-1.7B-Embedding vs. BGE-M3 (BAAI/bge-m3)

BGE-M3 is a 567M parameter multilingual embedding model that also supports dense, sparse, and ColBERT-style retrieval. It is smaller and faster, with MTEB Multilingual scores around 60–61. BidirLM-1.7B-Embedding achieves higher scores (62.9) but requires more VRAM and compute. If you need hybrid search (dense + sparse) or are tight on GPU memory, BGE-M3 is the pragmatic choice. If maximum embedding quality and fine-tuning flexibility are priorities, BidirLM wins.

BidirLM-1.7B-Embedding vs. multilingual-e5-large (intfloat/multilingual-e5-large)

Multilingual-e5-large has 335M parameters and scores around 60 on MTEB Multilingual. It is much faster and runs on almost any GPU. BidirLM provides a ~3 point improvement on mean task score but at 5× the parameter count. For high-throughput production embedding pipelines, e5-large is lighter. For accuracy-critical applications with moderate throughput demands, BidirLM is the better model.

BidirLM-1.7B-Embedding vs. BidirLM-0.6B

The 0.6B variant (based on Qwen3-0.6B) scores 59.6 on MTEB Multilingual V2. It is about half the memory footprint (3.5 GB FP16) and runs faster. If you only need English or a small set of languages, the 1.7B model offers a meaningful quality uplift. For multilingual retrieval with a tight VRAM budget, the 0.6B model is a strong alternative.

Choose BidirLM-1.7B-Embedding when you need the best available open-source multilingual embedding from a 1.7B class model, with the ability to fine-tune for downstream tasks without switching frameworks.

Related Models

BidirLM

BidirLM-Omni-2.5B-Embedding

2.4BDense

BidirLM

BidirLM-1B-Embedding

1BDense

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

BidirLM

BidirLM-1.7B-Embedding

A 1.7B bidirectional encoder distilled from causal Qwen3-1.7B via masking + contrastive adaptation.

1.7B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source embedding text workloads

A workable 1.7B-parameter dense embedding model from BidirLM. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1.7B

Active Params1.4B

ArchitectureDense

ProviderBidirLM

Download Size3.5 GB

Community

Monthly Downloads346

Likes6

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Retrieval

59.9

Classification

65.9

Clustering

51.5

STS

74.2

MBA Open Score

48.6CC

Benchmark60%

62.9

Popularity25%

5.6

Efficiency15%

63.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	1.4 GB
Acer Veriton GN100 AI MiniAcer	SS	1.4 GB
AMD Instinct MI300XAMD	SS	1.4 GB
AMD Instinct MI325XAMD	SS	1.4 GB
AMD Instinct MI355XAMD	SS	1.4 GB
AMD Radeon RX 7600 8GBAMD	SS	1.4 GB
AMD Radeon RX 7700 XTAMD	SS	1.4 GB
AMD Radeon RX 7800 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTAMD	SS	1.4 GB
AMD Radeon RX 7900 XTXAMD	SS	1.4 GB
AMD Radeon RX 9070AMD	SS	1.4 GB
AMD Radeon RX 9070 XTAMD	SS	1.4 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	1.4 GB
Apple M4Apple	SS	1.4 GB
Apple M4 Max (40-core GPU)Apple	SS	1.4 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple M5Apple	SS	1.4 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	1.4 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	1.4 GB
Apple Mac Mini (M1, 2020)Apple	SS	1.4 GB
Apple Mac Mini (M2, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	1.4 GB
Apple Mac Mini (M4, 2024)Apple	SS	1.4 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	1.4 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	1.4 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 3080Vast.ai · Spot · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 3080Vast.ai · On-Demand · 10 GB VRAM	$0.03
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.07
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.08
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.09

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

Capabilities & Use Cases

Key strengths:

Multilingual: Inheriting Qwen3’s vocabulary, the model covers 119 languages with robust performance on MTEB Multilingual V2. It was further reinforced with contrastive data from 87 languages, making it suitable for cross-lingual retrieval and zero-shot transfer.
Fine-tuning versatility: Thanks to the MNTP pre-training, it retains the ability to be fine-tuned on downstream tasks without catastrophic forgetting – something pure contrastive models often struggle with.
Open license: Apache 2.0 allows unrestricted use in commercial products, research, and fine-tuning.

Concrete use cases:

Semantic search: Embed queries and documents to power local search engines or RAG pipelines.
Zero-shot classification: Use cosine similarity between class descriptions and input text to classify without any training data.
Cross-lingual retrieval: Search a multilingual corpus using an English query.
NER/NLI fine-tuning: Adapt the model with a small amount of labeled data for specific domains (e.g., legal, medical).

Running BidirLM-1.7B-Embedding Locally

VRAM requirements by quantization:

Precision	VRAM (approx)	Use case
FP16	~3.4 GB	Recommended for maximum accuracy on RTX 4090, A-series, or M-series with >8 GB
Q4_K_M (4-bit)	~1.1 GB	Works on 6–8 GB GPUs; minimal accuracy loss for embeddings
Q8_0 (8-bit)	~1.7 GB	Good balance for lower-end GPUs

Consumer hardware that can run it:

RTX 4090 (24 GB): Q4_K_M or FP16 with ease; batch encoding of hundreds of sentences per second.
RTX 3060 12 GB: Q4_K_M comfortably; Q8_0 also works. Expect 50–100 tokens/sec depending on sequence length.
M4 Max (64 GB unified memory): FP16 or Q8_0 with high throughput.
Apple M1/M2 (8–16 GB): Q4_K_M only; around 20–40 tokens/sec.

1from sentence_transformers import SentenceTransformer
2model = SentenceTransformer("BidirLM/BidirLM-1.7B-Embedding", trust_remote_code=True)
3embeddings = model.encode(["Hello, world"])

How It Compares

BidirLM-1.7B-Embedding vs. BGE-M3 (BAAI/bge-m3)

BGE-M3 is a 567M parameter multilingual embedding model that also supports dense, sparse, and ColBERT-style retrieval. It is smaller and faster, with MTEB Multilingual scores around 60–61. BidirLM-1.7B-Embedding achieves higher scores (62.9) but requires more VRAM and compute. If you need hybrid search (dense + sparse) or are tight on GPU memory, BGE-M3 is the pragmatic choice. If maximum embedding quality and fine-tuning flexibility are priorities, BidirLM wins.

BidirLM-1.7B-Embedding vs. multilingual-e5-large (intfloat/multilingual-e5-large)

Multilingual-e5-large has 335M parameters and scores around 60 on MTEB Multilingual. It is much faster and runs on almost any GPU. BidirLM provides a ~3 point improvement on mean task score but at 5× the parameter count. For high-throughput production embedding pipelines, e5-large is lighter. For accuracy-critical applications with moderate throughput demands, BidirLM is the better model.

BidirLM-1.7B-Embedding vs. BidirLM-0.6B

The 0.6B variant (based on Qwen3-0.6B) scores 59.6 on MTEB Multilingual V2. It is about half the memory footprint (3.5 GB FP16) and runs faster. If you only need English or a small set of languages, the 1.7B model offers a meaningful quality uplift. For multilingual retrieval with a tight VRAM budget, the 0.6B model is a strong alternative.

Related Models

BidirLM

BidirLM-Omni-2.5B-Embedding

2.4BDense

BidirLM

BidirLM-1B-Embedding

1BDense

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.