BidirLM

BidirLM-1B-Embedding

Causal Gemma3-1B turned into a strong bidirectional embedder via masking-then-contrastive adaptation.

1B paramsDense

Our Take

Best for: Open-source embedding text workloads

A workable 1B-parameter dense embedding model from BidirLM. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1B

Active Params0.698B

ArchitectureDense

ProviderBidirLM

Download Size2.0 GB

Community

Monthly Downloads535

Likes3

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Gemma LicenseView Full License

Performance & Scoring

Benchmarks

Retrieval

56.5

Classification

65.9

Clustering

50.3

STS

74.6

MBA Open Score

49.6CC

Benchmark60%

61.8

Popularity25%

7.8

Efficiency15%

70.4

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	0.9 GB
Acer Veriton GN100 AI MiniAcer	SS	0.9 GB
AMD Instinct MI300XAMD	SS	0.9 GB
AMD Instinct MI325XAMD	SS	0.9 GB
AMD Instinct MI355XAMD	SS	0.9 GB
AMD Radeon RX 7600 8GBAMD	SS	0.9 GB
AMD Radeon RX 7700 XTAMD	SS	0.9 GB
AMD Radeon RX 7800 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTXAMD	SS	0.9 GB
AMD Radeon RX 9070AMD	SS	0.9 GB
AMD Radeon RX 9070 XTAMD	SS	0.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.9 GB
Apple M4Apple	SS	0.9 GB
Apple M4 Max (40-core GPU)Apple	SS	0.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple M5Apple	SS	0.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.9 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 5070 TiVast.ai · On-Demand · 16 GB VRAM	$0.13
NVIDIA GeForce RTX 3080RunPod · Community · 10 GB VRAM	$0.17

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

BidirLM-1B-Embedding is a dense 1‑billion‑parameter bidirectional text embedding model, adapted from the causal decoder Gemma3‑1B. Developed by BidirLM and released under the Gemma License, it converts a standard generative LLM into a strong encoder using a two‑stage recipe: masked next‑token prediction (MNTP) to unlock bidirectional attention, followed by contrastive fine‑tuning on multilingual data. The result is a compact embedder that scores 62.1 on MTEB Multilingual V2 (mean task) — competitive with much larger alternatives — while staying small enough to run on consumer hardware.

This model fills the gap between BERT‑style encoders (which cap out around 300M params) and large embedding models (which require datacenter GPUs). It inherits the rich representation knowledge of Gemma3, then specializes it for dense retrieval, semantic similarity, classification, and downstream fine‑tuning — all without needing cloud APIs.

Architecture & Technical Details

Parameters: 1,001M (dense)
Architecture: Dense transformer with full bidirectional attention
Embedding Dimension: 1152
Context Length: 512 tokens for MTEB evaluation; underlying Gemma3 backbone supports up to 32,768 tokens (adjust model.max_seq_length or max_length accordingly)
Modality: Text‑only
Languages: Multilingual — over 140 languages supported natively by the Gemma3 base, reinforced with contrastive training covering 87 languages

The adaptation process is critical: unlike contrastive‑only models, BidirLM first trains a Fill‑Mask checkpoint via MNTP, which teaches the causal LLM to attend bidirectionally. This step prevents catastrophic forgetting and makes the encoder effective for token‑level tasks (NER, classification) as well as generic embedding benchmarks. The final embedding model (BidirLM-1B-Embedding) is the MNTP checkpoint further tuned with contrastive losses.

Because it’s a dense model, every parameter is active during inference. This means VRAM consumption scales linearly with parameter count: a 1B model at FP16 requires roughly 2 GB of GPU memory, making it far more memory‑efficient than MoE variants of the same size.

Capabilities & Use Cases

BidirLM-1B-Embedding excels at two broad categories of work: generic text embeddings (via Sentence Transformers) and downstream fine‑tuning (via HuggingFace Transformers).

General Embeddings

Semantic Textual Similarity (STS) – producing high‑quality cosine similarities for sentence pairs
Dense Retrieval – encoding queries and documents for retrieval‑augmented generation or search
Clustering & Classification – sentence‑level tasks where a single vector per input suffices
Reranking & Bitext Mining – pair‑wise scoring and cross‑lingual alignment
Multilabel Classification – predicting multiple labels from a single representation

Downstream Fine‑Tuning

Because the model was pre‑trained with MNTP, it retains strong token‑level understanding. You can fine‑tune it directly for:

Sequence classification (e.g., MNLI, XNLI, PAWS‑X)
Token classification (e.g., NER with PAN‑X, POS tagging)
Information retrieval (e.g., MIRACL, CodeSearchNet)
Sequence regression (e.g., Seahorse for quality estimation)

Concrete use cases: building a multilingual semantic search engine for 100+ languages, fine‑tuning a custom NER pipeline on legal documents, or replacing a heavier embedding model in a retrieval‑augmented pipeline on a single RTX 4090.

Running BidirLM-1B-Embedding Locally

This is where the model shines: you can run it on a single consumer GPU without compromises.

VRAM Requirements

FP16 (full precision): ~2 GB – fits on any modern GPU with 4 GB+ (GTX 1060 6GB, RTX 3050)
Q4_K_M (4‑bit quantization): ~1.2 GB – runs on integrated graphics or 2GB VRAM cards
Q8_0 (8‑bit): ~1.6 GB
Recommended: Q4_K_M gives negligible quality loss vs FP16 while cutting memory by 40%. Start there.

Real‑World Hardware

RTX 3070 / 4070: FP16 comfortably, batch size 8+ for retrieval
RTX 4090: Overkill – batch size 64+ at FP16, hundreds of tokens per second
Apple M4 Max (64GB unified): FP16 with room to spare; M1/M2 with 16GB also viable at Q4_K_M
Laptop GPUs (RTX 3050, 4GB): Q4_K_M is your friend – expect 20–40 tokens/second for single‑sequence encoding

Performance Expectations

Tokens‑per‑second depends on hardware and quantization. Rough estimates for a single sequence (512 tokens):

RTX 4090 (Q4_K_M): 500–700 t/s
RTX 3070 (Q4_K_M): 200–350 t/s
M4 Max (Q4_K_M): 300–450 t/s
RTX 4090 (FP16): 250–400 t/s

Processing documents in batches amplifies throughput linearly up to VRAM capacity.

Getting Started Fast

The easiest way to run it locally is via [Ollama](https://ollama.com). While an official model may not be available on day one, you can pull the GGUF conversion from HuggingFace community repos or convert the model yourself with llama.cpp. Alternatively, use Sentence Transformers directly:

1from sentence_transformers import SentenceTransformer
2
3model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True)
4embeddings = model.encode(["Your text here"])

Note the trust_remote_code=True requirement – this is mandatory because BidirLM uses custom modeling code to enable bidirectional attention.

How It Compares

Model	Parameters	MTEB Multi. V2	Context	Multilingual	License
BidirLM-1B-Embedding	1.0B	62.1	512 (max 32K)	140+	Gemma License
BGE‑M3 (BAAI)	567M	~59.5	8192	100+	MIT
Stella‑400M	400M	~57.0	512	English‑focused	MIT

When to choose BidirLM‑1B: If you need the highest multilingual embedding quality at the 1B scale, especially for languages beyond English. The MNTP pre‑training also makes it uniquely suited for fine‑tuning on token‑level tasks – BGE‑M3 and Stella are contrastive‑only and lack this capability. The 32K token ceiling (via Gemma3 backbone) also allows longer documents when you increase max_seq_length, though MTEB scores are only validated at 512.

When to choose an alternative: If you’re constrained to a sub‑500M model (e.g., edge devices), Stella‑400M or BGE‑Small offer lower VRAM. For English‑only retrieval at higher throughput, BGE‑M3 is slightly faster and permissively licensed. BidirLM’s Gemma License imposes usage restrictions (see the license for details) – if Apache‑2.0 or MIT is mandatory, the alternatives are safer.

Related Models

BidirLM

BidirLM-Omni-2.5B-Embedding

2.4BDense

BidirLM

BidirLM-1.7B-Embedding

1.7BDense

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

BidirLM

BidirLM-1B-Embedding

Causal Gemma3-1B turned into a strong bidirectional embedder via masking-then-contrastive adaptation.

1B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source embedding text workloads

A workable 1B-parameter dense embedding model from BidirLM. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters1B

Active Params0.698B

ArchitectureDense

ProviderBidirLM

Download Size2.0 GB

Community

Monthly Downloads535

Likes3

Last Updated21 days ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Gemma LicenseView Full License

Performance & Scoring

Benchmarks

Retrieval

56.5

Classification

65.9

Clustering

50.3

STS

74.6

MBA Open Score

49.6CC

Benchmark60%

61.8

Popularity25%

7.8

Efficiency15%

70.4

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	0.9 GB
Acer Veriton GN100 AI MiniAcer	SS	0.9 GB
AMD Instinct MI300XAMD	SS	0.9 GB
AMD Instinct MI325XAMD	SS	0.9 GB
AMD Instinct MI355XAMD	SS	0.9 GB
AMD Radeon RX 7600 8GBAMD	SS	0.9 GB
AMD Radeon RX 7700 XTAMD	SS	0.9 GB
AMD Radeon RX 7800 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTAMD	SS	0.9 GB
AMD Radeon RX 7900 XTXAMD	SS	0.9 GB
AMD Radeon RX 9070AMD	SS	0.9 GB
AMD Radeon RX 9070 XTAMD	SS	0.9 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.9 GB
Apple M4Apple	SS	0.9 GB
Apple M4 Max (40-core GPU)Apple	SS	0.9 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple M5Apple	SS	0.9 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.9 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.9 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.9 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.9 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.9 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.9 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.9 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 5070 TiVast.ai · On-Demand · 16 GB VRAM	$0.13
NVIDIA GeForce RTX 3080RunPod · Community · 10 GB VRAM	$0.17

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

Parameters: 1,001M (dense)
Architecture: Dense transformer with full bidirectional attention
Embedding Dimension: 1152
Context Length: 512 tokens for MTEB evaluation; underlying Gemma3 backbone supports up to 32,768 tokens (adjust model.max_seq_length or max_length accordingly)
Modality: Text‑only
Languages: Multilingual — over 140 languages supported natively by the Gemma3 base, reinforced with contrastive training covering 87 languages

Capabilities & Use Cases

BidirLM-1B-Embedding excels at two broad categories of work: generic text embeddings (via Sentence Transformers) and downstream fine‑tuning (via HuggingFace Transformers).

General Embeddings

Semantic Textual Similarity (STS) – producing high‑quality cosine similarities for sentence pairs
Dense Retrieval – encoding queries and documents for retrieval‑augmented generation or search
Clustering & Classification – sentence‑level tasks where a single vector per input suffices
Reranking & Bitext Mining – pair‑wise scoring and cross‑lingual alignment
Multilabel Classification – predicting multiple labels from a single representation

Downstream Fine‑Tuning

Because the model was pre‑trained with MNTP, it retains strong token‑level understanding. You can fine‑tune it directly for:

Sequence classification (e.g., MNLI, XNLI, PAWS‑X)
Token classification (e.g., NER with PAN‑X, POS tagging)
Information retrieval (e.g., MIRACL, CodeSearchNet)
Sequence regression (e.g., Seahorse for quality estimation)

Running BidirLM-1B-Embedding Locally

This is where the model shines: you can run it on a single consumer GPU without compromises.

VRAM Requirements

FP16 (full precision): ~2 GB – fits on any modern GPU with 4 GB+ (GTX 1060 6GB, RTX 3050)
Q4_K_M (4‑bit quantization): ~1.2 GB – runs on integrated graphics or 2GB VRAM cards
Q8_0 (8‑bit): ~1.6 GB
Recommended: Q4_K_M gives negligible quality loss vs FP16 while cutting memory by 40%. Start there.

Real‑World Hardware

RTX 3070 / 4070: FP16 comfortably, batch size 8+ for retrieval
RTX 4090: Overkill – batch size 64+ at FP16, hundreds of tokens per second
Apple M4 Max (64GB unified): FP16 with room to spare; M1/M2 with 16GB also viable at Q4_K_M
Laptop GPUs (RTX 3050, 4GB): Q4_K_M is your friend – expect 20–40 tokens/second for single‑sequence encoding

Performance Expectations

Tokens‑per‑second depends on hardware and quantization. Rough estimates for a single sequence (512 tokens):

RTX 4090 (Q4_K_M): 500–700 t/s
RTX 3070 (Q4_K_M): 200–350 t/s
M4 Max (Q4_K_M): 300–450 t/s
RTX 4090 (FP16): 250–400 t/s

Processing documents in batches amplifies throughput linearly up to VRAM capacity.

Getting Started Fast

1from sentence_transformers import SentenceTransformer
2
3model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True)
4embeddings = model.encode(["Your text here"])

Note the trust_remote_code=True requirement – this is mandatory because BidirLM uses custom modeling code to enable bidirectional attention.

How It Compares

Model	Parameters	MTEB Multi. V2	Context	Multilingual	License
BidirLM-1B-Embedding	1.0B	62.1	512 (max 32K)	140+	Gemma License
BGE‑M3 (BAAI)	567M	~59.5	8192	100+	MIT
Stella‑400M	400M	~57.0	512	English‑focused	MIT

Related Models

BidirLM

BidirLM-Omni-2.5B-Embedding

2.4BDense

BidirLM

BidirLM-1.7B-Embedding

1.7BDense

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.