Mistral AI

Voxtral Mini 3B (2507)

Mistral AI's compact 3B-parameter open-weights audio-language model built on Ministral-3B with a Whisper-derived encoder, designed for transcription, audio Q&A, summarization, and function-calling from voice across 8+ languages.

3B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters3B

ArchitectureDense

ProviderMistral AI

Download Size18.7 GB

Community

Monthly Downloads646.4K

Likes641

Last Updated9 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

7.0%

Overall Score

65.3BB

Benchmark40%

85.9

Popularity25%

84.7

Efficiency25%

11.1

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	2.3 GB
AMD Instinct MI300XAMD	SS	2.3 GB
AMD Instinct MI325XAMD	SS	2.3 GB
AMD Instinct MI355XAMD	SS	2.3 GB
AMD Radeon RX 7600 8GBAMD	SS	2.3 GB
AMD Radeon RX 7700 XTAMD	SS	2.3 GB
AMD Radeon RX 7800 XTAMD	SS	2.3 GB
AMD Radeon RX 7900 XTAMD	SS	2.3 GB
AMD Radeon RX 7900 XTXAMD	SS	2.3 GB
AMD Radeon RX 9070AMD	SS	2.3 GB
AMD Radeon RX 9070 XTAMD	SS	2.3 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.3 GB
Apple M4Apple	SS	2.3 GB
Apple M4 Max (40-core GPU)Apple	SS	2.3 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.3 GB
Apple M5Apple	SS	2.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.3 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.3 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.3 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.3 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.3 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.3 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.3 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	2.3 GB

Rows per page

Page 1 of 4

About This Model

Voxtral Mini 3B-2507

Architecture: Multimodal audio-language model combining a Whisper-style audio encoder with the Ministral-3B decoder LLM through a multi-modal projector (multi_modal_projector.linear_1 / linear_2). Implemented in Hugging Face Transformers as VoxtralForConditionalGeneration (≥ 4.54.0).

Modality: Audio + text in / text out. 32k token context window; supports audios up to ~30 minutes for transcription and ~40 minutes for understanding.

Capabilities:

Dedicated transcription mode with automatic source-language detection.
Built-in Q&A and summarization directly from audio without chaining a separate ASR + LLM.
Native multilingual support in English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian (and more).
Function calling from voice for direct backend-action triggering.
Retains Ministral-3B's strong text understanding for chat completion.

Training & deployment: Released July 15, 2025 (paper arXiv 2507.13264). ~9.5 GB GPU RAM in bf16/fp16; runs on a single GPU. Distributed under Apache 2.0 with optimized hosted inference on Mistral's La Plateforme (Voxtral Mini Transcribe variant from $0.001/audio-minute).

Use cases: Cost-sensitive transcription, voice agents, multilingual meeting/call transcripts, edge and local deployments where Whisper-class quality is needed at sub-Whisper cost.

Related Models

Mistral AI

Voxtral Small 24B (2507)

24BDense

24B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.

Mistral AI

Voxtral Mini 3B (2507)

3B paramsDense

View on Hugging Face Source Code Official Page

Model Specifications

Parameters3B

ArchitectureDense

ProviderMistral AI

Download Size18.7 GB

Community

Monthly Downloads646.4K

Likes641

Last Updated9 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

Apache 2.0View Full License

Performance & Scoring

Benchmarks

WER

7.0%

Overall Score

65.3BB

Benchmark40%

85.9

Popularity25%

84.7

Efficiency25%

11.1

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

83 devices


Acer Veriton GN100 AI MiniAcer	SS	2.3 GB
AMD Instinct MI300XAMD	SS	2.3 GB
AMD Instinct MI325XAMD	SS	2.3 GB
AMD Instinct MI355XAMD	SS	2.3 GB
AMD Radeon RX 7600 8GBAMD	SS	2.3 GB
AMD Radeon RX 7700 XTAMD	SS	2.3 GB
AMD Radeon RX 7800 XTAMD	SS	2.3 GB
AMD Radeon RX 7900 XTAMD	SS	2.3 GB
AMD Radeon RX 7900 XTXAMD	SS	2.3 GB
AMD Radeon RX 9070AMD	SS	2.3 GB
AMD Radeon RX 9070 XTAMD	SS	2.3 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	2.3 GB
Apple M4Apple	SS	2.3 GB
Apple M4 Max (40-core GPU)Apple	SS	2.3 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	2.3 GB
Apple M5Apple	SS	2.3 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	2.3 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	2.3 GB
Apple Mac Mini (M1, 2020)Apple	SS	2.3 GB
Apple Mac Mini (M2, 2023)Apple	SS	2.3 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	2.3 GB
Apple Mac Mini (M4, 2024)Apple	SS	2.3 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	2.3 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	2.3 GB
Apple Mac Studio (M1 Ultra, 2022)Apple	SS	2.3 GB

Rows per page

Page 1 of 4

About This Model

Voxtral Mini 3B-2507

Modality: Audio + text in / text out. 32k token context window; supports audios up to ~30 minutes for transcription and ~40 minutes for understanding.

Capabilities:

Dedicated transcription mode with automatic source-language detection.
Built-in Q&A and summarization directly from audio without chaining a separate ASR + LLM.
Native multilingual support in English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian (and more).
Function calling from voice for direct backend-action triggering.
Retains Ministral-3B's strong text understanding for chat completion.

Use cases: Cost-sensitive transcription, voice agents, multilingual meeting/call transcripts, edge and local deployments where Whisper-class quality is needed at sub-Whisper cost.

Related Models

Mistral AI

Voxtral Small 24B (2507)

24BDense

24B

Find the best hardware for this model

Use our hardware calculator to find the optimal device for running this model.