Institute of Computing Technology

BOOM_4B_v1

A Qwen3-4B text embedder trained with bagging-based model merging for OOD-robust retrieval.

4B paramsDense

Our Take

Best for: Open-source embedding text workloads

A workable 4B-parameter dense embedding model from Institute of Computing Technology. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters4B

Active Params3.6B

ArchitectureDense

ProviderInstitute of Computing Technology

Download Size16.1 GB

Community

Monthly Downloads37

Likes1

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Retrieval

62.2

Classification

66.9

Clustering

52.8

STS

74.4

MBA Open Score

45.7CC

Benchmark60%

64.1

Popularity25%

0.0

Efficiency15%

48.1

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	2.7 GB
Acer Veriton GN100 AI MiniAcer	SS	2.7 GB
AMD Instinct MI300XAMD	SS	2.7 GB
AMD Instinct MI325XAMD	SS	2.7 GB
AMD Instinct MI355XAMD	SS	2.7 GB
AMD Radeon RX 7600 8GBAMD	SS	2.7 GB
AMD Radeon RX 7700 XTAMD	SS	2.7 GB
AMD Radeon RX 7800 XTAMD	SS	2.7 GB
AMD Radeon RX 7900 XTAMD	SS	2.7 GB
AMD Radeon RX 7900 XTXAMD	SS	2.7 GB
AMD Radeon RX 9070AMD	SS	2.7 GB
AMD Radeon RX 9070 XTAMD	SS	2.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.7 GB
Apple M4Apple	SS	2.7 GB
Apple M4 Max (40-core GPU)Apple	SS	2.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.7 GB
Apple M5Apple	SS	2.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.7 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 3 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA L4Vast.ai · Spot · 24 GB VRAM	$0.03
NVIDIA L4Vast.ai · On-Demand · 24 GB VRAM	$0.04
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.09
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

BOOM_4B_v1 is a 4 billion parameter dense text embedding model developed by the Institute of Computing Technology (ICT), Chinese Academy of Sciences. It is built on the Qwen3-4B architecture and fine-tuned for general-purpose text representation, with a specific focus on out-of-domain (OOD) robustness. The model uses a novel bagging-based model merging technique (BOOM) that trains multiple embedding models on sampled subsets of training data then merges them into a single model. This approach improves both in-domain and OOD retrieval performance while keeping inference identical to a single model.

The model occupies a specific niche: large-scale text embedders. Most embedding models run under 1B parameters (e.g., BGE-M3, E5, GTE). BOOM_4B_v1 pushes that ceiling to 4B, targeting applications that demand higher representational capacity—particularly in retrieval augmented generation (RAG), enterprise search, and semantic similarity tasks where robustness across unseen domains matters. It competes with other large embedders like intfloat/e5-mistral-7b-instruct (7B) and Alibaba’s gte-Qwen2-1.5B (1.5B), but offers a middle ground in size and a unique training methodology.

Architecture & Technical Details

BOOM_4B_v1 is a dense transformer, not MoE. All 4B parameters are active during inference. The model is initialized from Qwen3-4B, then trained for sentence embedding using last-token pooling. The base model supports a context length of 32,000 tokens (per the Qwen3-4B specification), enabling encoding of long documents in a single pass.

The key architectural innovation is in training, not in model design. Five embedding models are trained on different random subsets (20%, 40%, 60%, 80%, 100%) of the full 2.8M multi-task corpus. These are then merged using Multi-SLERP (spherical linear interpolation) with weighted coefficients (0.2, 0.4, 0.6, 0.8, 1.0). The resulting model retains the inference cost of a single dense 4B network while capturing the variance-reducing benefits of bagging. This technique avoids the OOD generalization limitations of standard batch-level shuffling and supports incremental updates without full retraining.

The model is distributed in float32 precision. Quantization (e.g., to 8-bit or 4-bit) is supported via common inference engines.

Capabilities & Use Cases

BOOM_4B_v1 is a text-only embedding model designed for:

Information retrieval / RAG – dense retrieval over large document collections where query-document similarity is computed via cosine similarity on embeddings.
Semantic textual similarity (STS) – measuring how similar two sentences or paragraphs are.
Text classification – embedding-based classification using nearest neighbor or linear probes.
Clustering – grouping documents by semantic content.
Reranking – rescoring candidate documents from a first-stage retrieval.

Training data covered retrieval (MS MARCO, NQ, HotpotQA, FEVER, FiQA, etc.), reranking (StackOverflowDupQuestions), classification (Amazon Reviews, Banking77, IMDB), clustering (Arxiv, Reddit), STS (STS12-22), and code (Cornstack – JavaScript, Java, Python, PHP, Ruby). The model handles English-dominant text but can encode other languages present in the training mix (e.g., MIRACL, Mr. TyDi). It does not generate text; it produces fixed-length vectors.

Concrete use cases:

Building a local RAG pipeline for a domain-specific knowledge base (e.g., legal documents, internal wikis) with high OOD robustness.
Semantic search in a corpus of mixed sources where retrieval quality must degrade gracefully when queries differ from training data.
Foundation model for embedding-based classifiers where labeled data is scarce – leverage the 4B representation.

Running BOOM_4B_v1 Locally

Hardware Requirements

At 4B parameters, BOOM_4B_v1 is a mid-size model that fits on consumer GPUs with reasonable quantization.

Quantization	VRAM (approximate, with 32k context)	Recommended Hardware
FP16	~8 GB	RTX 3090 / 4090, M4 Max (64GB unified)
Q8_0	~5 GB	RTX 3080 12GB, RTX 4060 Ti 16GB
Q4_K_M	~3 GB	RTX 3060 12GB, M4 Pro, Apple M2 Ultra
Q3_K_S	~2.5 GB	RTX 2060 6GB (tight)

Realistic GPU: An RTX 4090 24GB can run FP16 with headroom for batching or longer contexts. An RTX 3060 12GB handles Q4_K_M well. Apple Silicon users with unified memory >16GB can run Q4_K_M comfortably.

Inference performance (estimated, Q4_K_M, batch size 1 on RTX 4090): ~200-300 tokens per second for encoding. Throughput scales with batch size – a batch of 32 documents of 512 tokens each should process at ~5000 tokens/sec.

Quick Start with Ollama

Ollama offers the fastest path. Install Ollama, then pull the model (if available in Ollama library) or use a custom Modelfile pointing to the HuggingFace repo. Alternatively use Sentence Transformers:

1from sentence_transformers import SentenceTransformer
2model = SentenceTransformer("ICT-TIME-and-Querit/BOOM_4B_v1")
3embeddings = model.encode(["Your text here"])

For speed, enable Flash Attention 2:

1model = SentenceTransformer(
2    "ICT-TIME-and-Querit/BOOM_4B_v1",
3    model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
4    tokenizer_kwargs={"padding_side": "left"},
5)

Quantization Recommendations

Best quantization for most users: Q4_K_M offers the best tradeoff of quality retention (~98% of FP16) and VRAM (~3 GB). Avoid Q3_K_S for production retrieval; it may degrade OOD performance due to cumulative precision loss. For memory-constrained hardware, Q8_0 (5 GB) is safer than aggressive 4-bit.

How It Compares

vs. gte-Qwen2-1.5B-embedding (1.5B)

Size: BOOM is 2.7x larger. Expect higher representational capacity and better performance on complex retrieval tasks.
OOD robustness: BOOM’s bagged training gives it a measurable edge in OOD benchmarks per the paper. gte-Qwen2 uses standard multi-task training.
Speed: gte-Qwen2 is faster (lower latency, higher throughput) due to smaller size. If throughput is more critical than marginal accuracy gains, the 1.5B model may suffice.
When to choose BOOM: When you need the best possible retrieval quality on diverse, unseen data and have the GPU memory to spare.

vs. intfloat/e5-mistral-7b-instruct (7B)

Size: BOOM is nearly half the size. It runs on lower-end hardware (RTX 3060 vs. RTX 4090 for FP16).
Performance: e5-mistral-7b often leads MTEB benchmarks due to its larger scale and instruction tuning for retrieval. BOOM’s OOD focus may close the gap on specific unseen domains.
When to choose BOOM: If you have limited VRAM (3-5 GB) and want a high-quality embedder, BOOM is the better fit. If you have 16+ GB and need top leaderboard scores, e5-mistral-7b is the current heavyweight.

Bottom line: BOOM_4B_v1 occupies a sweet spot: near-top performance with a VRAM footprint that fits most modern consumer GPUs when quantized. Its bagging-based training is a practical alternative to increasing model size for robustness. For local RAG deployments where hardware is a constraint and OOD quality matters, this is a strong candidate.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Institute of Computing Technology

BOOM_4B_v1

A Qwen3-4B text embedder trained with bagging-based model merging for OOD-robust retrieval.

4B paramsDense

View on Hugging Face Official Page

Our Take

Best for: Open-source embedding text workloads

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters4B

Active Params3.6B

ArchitectureDense

ProviderInstitute of Computing Technology

Download Size16.1 GB

Community

Monthly Downloads37

Likes1

Last Updated1 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

Retrieval

62.2

Classification

66.9

Clustering

52.8

STS

74.4

MBA Open Score

45.7CC

Benchmark60%

64.1

Popularity25%

0.0

Efficiency15%

48.1

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	2.7 GB
Acer Veriton GN100 AI MiniAcer	SS	2.7 GB
AMD Instinct MI300XAMD	SS	2.7 GB
AMD Instinct MI325XAMD	SS	2.7 GB
AMD Instinct MI355XAMD	SS	2.7 GB
AMD Radeon RX 7600 8GBAMD	SS	2.7 GB
AMD Radeon RX 7700 XTAMD	SS	2.7 GB
AMD Radeon RX 7800 XTAMD	SS	2.7 GB
AMD Radeon RX 7900 XTAMD	SS	2.7 GB
AMD Radeon RX 7900 XTXAMD	SS	2.7 GB
AMD Radeon RX 9070AMD	SS	2.7 GB
AMD Radeon RX 9070 XTAMD	SS	2.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.7 GB
Apple M4Apple	SS	2.7 GB
Apple M4 Max (40-core GPU)Apple	SS	2.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.7 GB
Apple M5Apple	SS	2.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.7 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 3 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA L4Vast.ai · Spot · 24 GB VRAM	$0.03
NVIDIA L4Vast.ai · On-Demand · 24 GB VRAM	$0.04
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.09
NVIDIA GeForce RTX 5060 TiVast.ai · On-Demand · 16 GB VRAM	$0.10
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

The model is distributed in float32 precision. Quantization (e.g., to 8-bit or 4-bit) is supported via common inference engines.

Capabilities & Use Cases

BOOM_4B_v1 is a text-only embedding model designed for:

Information retrieval / RAG – dense retrieval over large document collections where query-document similarity is computed via cosine similarity on embeddings.
Semantic textual similarity (STS) – measuring how similar two sentences or paragraphs are.
Text classification – embedding-based classification using nearest neighbor or linear probes.
Clustering – grouping documents by semantic content.
Reranking – rescoring candidate documents from a first-stage retrieval.

Concrete use cases:

Building a local RAG pipeline for a domain-specific knowledge base (e.g., legal documents, internal wikis) with high OOD robustness.
Semantic search in a corpus of mixed sources where retrieval quality must degrade gracefully when queries differ from training data.
Foundation model for embedding-based classifiers where labeled data is scarce – leverage the 4B representation.

Running BOOM_4B_v1 Locally

Hardware Requirements

At 4B parameters, BOOM_4B_v1 is a mid-size model that fits on consumer GPUs with reasonable quantization.

Quantization	VRAM (approximate, with 32k context)	Recommended Hardware
FP16	~8 GB	RTX 3090 / 4090, M4 Max (64GB unified)
Q8_0	~5 GB	RTX 3080 12GB, RTX 4060 Ti 16GB
Q4_K_M	~3 GB	RTX 3060 12GB, M4 Pro, Apple M2 Ultra
Q3_K_S	~2.5 GB	RTX 2060 6GB (tight)

Quick Start with Ollama

1from sentence_transformers import SentenceTransformer
2model = SentenceTransformer("ICT-TIME-and-Querit/BOOM_4B_v1")
3embeddings = model.encode(["Your text here"])

For speed, enable Flash Attention 2:

1model = SentenceTransformer(
2    "ICT-TIME-and-Querit/BOOM_4B_v1",
3    model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
4    tokenizer_kwargs={"padding_side": "left"},
5)

Quantization Recommendations

How It Compares

vs. gte-Qwen2-1.5B-embedding (1.5B)

Size: BOOM is 2.7x larger. Expect higher representational capacity and better performance on complex retrieval tasks.
OOD robustness: BOOM’s bagged training gives it a measurable edge in OOD benchmarks per the paper. gte-Qwen2 uses standard multi-task training.
Speed: gte-Qwen2 is faster (lower latency, higher throughput) due to smaller size. If throughput is more critical than marginal accuracy gains, the 1.5B model may suffice.
When to choose BOOM: When you need the best possible retrieval quality on diverse, unseen data and have the GPU memory to spare.

vs. intfloat/e5-mistral-7b-instruct (7B)

Size: BOOM is nearly half the size. It runs on lower-end hardware (RTX 3060 vs. RTX 4090 for FP16).
Performance: e5-mistral-7b often leads MTEB benchmarks due to its larger scale and instruction tuning for retrieval. BOOM’s OOD focus may close the gap on specific unseen domains.
When to choose BOOM: If you have limited VRAM (3-5 GB) and want a high-quality embedder, BOOM is the better fit. If you have 16+ GB and need top leaderboard scores, e5-mistral-7b is the current heavyweight.

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.