Useful Sensors

Moonshine Streaming Medium

A 245M-parameter streaming English ASR model from Useful Sensors designed for low-latency, on-device transcription. Uses an 'ergodic' sliding-window encoder and achieves better WER than Whisper Large v3 on the OpenASR Leaderboard at a fraction of the size.

0.245B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source asr workloads

A strong 0.245B-parameter dense audio model from Useful Sensors. Treat the modality benchmarks above as the leading indicator of fit — composite scoring across modalities is still maturing. On the rise in download charts.

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters0.245B

ArchitectureDense

ProviderUseful Sensors

Download Size4.1 GB

Community

Monthly Downloads13.7K

Likes22

Last Updated4 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

MITView Full License

Performance & Scoring

Benchmarks

WER

6.7%

MBA Open Score

72.2AA

Benchmark40%

86.7

Popularity25%

43.9

Efficiency25%

78.3

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	0.7 GB
Acer Veriton GN100 AI MiniAcer	SS	0.7 GB
AMD Instinct MI300XAMD	SS	0.7 GB
AMD Instinct MI325XAMD	SS	0.7 GB
AMD Instinct MI355XAMD	SS	0.7 GB
AMD Radeon RX 7600 8GBAMD	SS	0.7 GB
AMD Radeon RX 7700 XTAMD	SS	0.7 GB
AMD Radeon RX 7800 XTAMD	SS	0.7 GB
AMD Radeon RX 7900 XTAMD	SS	0.7 GB
AMD Radeon RX 7900 XTXAMD	SS	0.7 GB
AMD Radeon RX 9070AMD	SS	0.7 GB
AMD Radeon RX 9070 XTAMD	SS	0.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.7 GB
Apple M4Apple	SS	0.7 GB
Apple M4 Max (40-core GPU)Apple	SS	0.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.7 GB
Apple M5Apple	SS	0.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.7 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · Spot · 24 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · On-Demand · 24 GB VRAM	$0.13

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Moonshine Streaming Medium is a 245M-parameter streaming English automatic speech recognition (ASR) model from Useful Sensors, designed to bring low-latency, on-device transcription to edge hardware. The model achieves a word error rate of 6.65% on the OpenASR Leaderboard — outperforming Whisper Large v3 despite being roughly 6× smaller. It’s purpose-built for real-time voice applications where latency, privacy, and local execution matter more than cloud connectivity.

The model is part of the broader Moonshine Voice framework, an open-source toolkit (MIT license) from the team behind TensorFlow’s original founders. Moonshine Streaming Medium targets developers building voice agents, live captioning, smart assistants, and interactive systems that must respond while the user is still speaking. It’s a dense architecture with 0.245B parameters, optimized for streaming inference rather than batch processing.

Architecture & Technical Details

Moonshine Streaming Medium uses a sequence-to-sequence Transformer with a novel “ergodic” sliding-window encoder. The encoder processes audio in overlapping chunks using bounded local attention and no positional embeddings — positional information is injected via an adapter before the autoregressive decoder. This design eliminates the need to wait for the full audio clip before starting transcription, enabling sub-100ms streaming latency on modest hardware.

Key architectural characteristics:

50 Hz audio frontend that converts raw waveform into a compact feature stream.
Sliding-window Transformer encoder that caches state between windows, avoiding redundant computation.
Autoregressive decoder with standard cross-attention, generating text tokens incrementally.
Support for Hugging Face Transformers (pipeline tag: automatic-speech-recognition), plus native C API and pre-built packages for iOS, Android, Python, macOS, Windows, Linux, Raspberry Pi, and wearables.
Single-language model (English only) trained from scratch on diverse speech datasets.

The context window is effectively unbounded for streaming: the model processes audio indefinitely by shifting the attention window forward. No context length is specified, but practical usage suggests it handles continuous streams of at least several minutes without degradation.

Capabilities & Use Cases

Moonshine Streaming Medium is built for one thing: converting English speech to text in real time on local hardware. It excels at:

Live voice agents and conversational AI – The sub-200ms streaming latency makes it suitable for turn-based dialogue systems where the model transcribes speech while the user is still talking.
On-device captioning – Run directly on a laptop, tablet, or Raspberry Pi to generate closed captions for meetings, lectures, or live events without sending audio to the cloud.
Voice-controlled smart devices – IoT, wearables, and home automation endpoints can use the 34M–245M parameter family; the Medium variant is appropriate for devices with at least 512 MB of RAM.
Privacy-sensitive transcription – Since everything runs locally, no audio leaves the device. Use cases include medical dictation, legal transcription, and personal assistant applications.
Prototyping and research – The MIT license and Hugging Face integration allow easy experimentation with customizable voice pipelines.

The model is not suited for multilingual transcription, music/sound event detection, or offline batch processing of very long recordings (non-streaming models like Whisper may be more efficient for that).

Running Moonshine Streaming Medium Locally

The tiny parameter count makes Moonshine Streaming Medium extraordinarily accessible. At 0.245B parameters, even the FP32 weights consume roughly 1 GB of VRAM / system memory. Quantization drops this further:

Quantization	Approximate VRAM	Typical Use Case
FP32 (unquantized)	~1.0 GB	Maximum accuracy on desktop/laptop
FP16	~0.5 GB	Good quality, runs on most integrated GPUs
Q4_K_M (4-bit)	~0.3 GB	Fits on smartphones, Raspberry Pi 4+
Q8_0 (8-bit)	~0.6 GB	Balanced for older hardware

Recommended hardware:

Consumer GPU: RTX 3060 or better is overkill – even integrated GPUs in recent Intel/AMD processors handle it easily. An RTX 4090 will give near-instantaneous token generation but is unnecessary.
Apple Silicon: M1 or later (M4 Max included) runs the model in <100ms per utterance using Core ML or the Moonshine Python package.
Edge devices: Raspberry Pi 4/5 with 2 GB+ RAM can run the Q4_K_M variant at approximately 2–3× realtime (e.g., transcribing 10 seconds of audio in 3–4 seconds). For real-time streaming, the Tiny (26M) or Small (123M) variants are preferred.
CPU-only: Works on any modern x86 or ARM CPU – expect 1–4× realtime depending on hardware.

Performance expectations: On an RTX 3090, the model achieves roughly 50–80 tokens per second for the decoder, while the encoder runs at ~10–20 ms per 30 ms audio window (0.3–0.6 real-time factor). Overall end-to-end latency from audio input to first word is typically below 100 ms on a mid-range GPU.

Quickest path to run locally: Install the Moonshine Python package via pip install moonshine (see [GitHub](https://github.com/usefulsensors/moonshine)). Alternatively, use Hugging Face Transformers with MoonshineStreamingForConditionalGeneration as shown in the model card. No API keys or cloud accounts are needed.

How It Compares

vs. Whisper Large v3 – Moonshine Medium beats Whisper Large v3 on the OpenASR Leaderboard (6.65% WER vs. ~7.4%) while being 30× smaller. Whisper is more robust to noisy environments and supports 99 languages; Moonshine is strictly English. For streaming, Moonshine’s sliding-window encoder provides lower latency and lower memory overhead. If you need multilingual speech recognition, Whisper remains the better choice.

vs. other edge ASR models (e.g., Paraformer-Large, Wav2Vec2-Large) – Moonshine Medium’s 245M parameters place it between small and medium edge ASR models. Paraformer-Large (~220M) offers competitive WER but is not designed for streaming. Moonshine’s explicit streaming architecture gives it a latency advantage. Wav2Vec2-Large (300M) requires fine-tuning for transcription (CTC-based) and lacks native streaming support; Moonshine is ready out of the box.

When to choose Moonshine Streaming Medium: You need real-time English transcription on a device with limited compute (Raspberry Pi, phone, low-end PC) and you want accuracy comparable to top cloud ASR APIs without sending data off-device. The MIT license and cross-platform support make it a low-friction option for production voice pipelines.

Explore the Provider

See all Useful Sensors models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Useful Sensors model we track.

Open Useful Sensors

Explore the Family

See every Moonshine release

The full Moonshine family leaderboard with sizes, benchmark scores, and a release timeline.

Open Moonshine

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.

Useful Sensors

Moonshine Streaming Medium

0.245B paramsDense

View on Hugging Face Source Code Official Page

Our Take

Best for: Open-source asr workloads

Generated from this model’s benchmarks and ranking signals. Editor reviews refine it over time.

Model Specifications

Parameters0.245B

ArchitectureDense

ProviderUseful Sensors

Download Size4.1 GB

Community

Monthly Downloads13.7K

Likes22

Last Updated4 months ago

Quick Start

Download from Hugging Face

Access model weights, configuration files, and documentation.

Download from Hugging Face

License

MITView Full License

Performance & Scoring

Benchmarks

WER

6.7%

MBA Open Score

72.2AA

Benchmark40%

86.7

Popularity25%

43.9

Efficiency25%

78.3

Versatility10%

70.0

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	SS	0.7 GB
Acer Veriton GN100 AI MiniAcer	SS	0.7 GB
AMD Instinct MI300XAMD	SS	0.7 GB
AMD Instinct MI325XAMD	SS	0.7 GB
AMD Instinct MI355XAMD	SS	0.7 GB
AMD Radeon RX 7600 8GBAMD	SS	0.7 GB
AMD Radeon RX 7700 XTAMD	SS	0.7 GB
AMD Radeon RX 7800 XTAMD	SS	0.7 GB
AMD Radeon RX 7900 XTAMD	SS	0.7 GB
AMD Radeon RX 7900 XTXAMD	SS	0.7 GB
AMD Radeon RX 9070AMD	SS	0.7 GB
AMD Radeon RX 9070 XTAMD	SS	0.7 GB
Apple M3 Ultra (32-core CPU, 80-core GPU)Apple	SS	0.7 GB
Apple M4Apple	SS	0.7 GB
Apple M4 Max (40-core GPU)Apple	SS	0.7 GB
Apple M4 Pro (14-core CPU, 20-core GPU)Apple	SS	0.7 GB
Apple M5Apple	SS	0.7 GB
Apple M5 Max (18-core CPU, 40-core GPU)Apple	SS	0.7 GB
Apple M5 Pro (18-core CPU, 20-core GPU)Apple	SS	0.7 GB
Apple Mac Mini (M1, 2020)Apple	SS	0.7 GB
Apple Mac Mini (M2, 2023)Apple	SS	0.7 GB
Apple Mac Mini (M2 Pro, 2023)Apple	SS	0.7 GB
Apple Mac Mini (M4, 2024)Apple	SS	0.7 GB
Apple Mac Mini (M4 Pro, 2024)Apple	SS	0.7 GB
Apple Mac Studio (M1 Max, 2022)Apple	SS	0.7 GB

Rows per page

Page 1 of 5

Rent in the Cloud

Cheapest current cloud rentals with at least 1 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5070 TiVast.ai · Spot · 16 GB VRAM	$0.11
NVIDIA GeForce RTX 3070RunPod · Community · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 3070RunPod · Spot · 8 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · Spot · 24 GB VRAM	$0.13
NVIDIA GeForce RTX 4090Vast.ai · On-Demand · 24 GB VRAM	$0.13

Per-GPU rate across RunPod and the Vast.ai marketplace.

Spot tier is interruptible. Plan for restarts when comparing against on-demand prices.

See the full price index

About This Model

Overview

Architecture & Technical Details

Key architectural characteristics:

50 Hz audio frontend that converts raw waveform into a compact feature stream.
Sliding-window Transformer encoder that caches state between windows, avoiding redundant computation.
Autoregressive decoder with standard cross-attention, generating text tokens incrementally.
Support for Hugging Face Transformers (pipeline tag: automatic-speech-recognition), plus native C API and pre-built packages for iOS, Android, Python, macOS, Windows, Linux, Raspberry Pi, and wearables.
Single-language model (English only) trained from scratch on diverse speech datasets.

Capabilities & Use Cases

Moonshine Streaming Medium is built for one thing: converting English speech to text in real time on local hardware. It excels at:

Live voice agents and conversational AI – The sub-200ms streaming latency makes it suitable for turn-based dialogue systems where the model transcribes speech while the user is still talking.
On-device captioning – Run directly on a laptop, tablet, or Raspberry Pi to generate closed captions for meetings, lectures, or live events without sending audio to the cloud.
Voice-controlled smart devices – IoT, wearables, and home automation endpoints can use the 34M–245M parameter family; the Medium variant is appropriate for devices with at least 512 MB of RAM.
Privacy-sensitive transcription – Since everything runs locally, no audio leaves the device. Use cases include medical dictation, legal transcription, and personal assistant applications.
Prototyping and research – The MIT license and Hugging Face integration allow easy experimentation with customizable voice pipelines.

Running Moonshine Streaming Medium Locally

Quantization	Approximate VRAM	Typical Use Case
FP32 (unquantized)	~1.0 GB	Maximum accuracy on desktop/laptop
FP16	~0.5 GB	Good quality, runs on most integrated GPUs
Q4_K_M (4-bit)	~0.3 GB	Fits on smartphones, Raspberry Pi 4+
Q8_0 (8-bit)	~0.6 GB	Balanced for older hardware

Recommended hardware:

Consumer GPU: RTX 3060 or better is overkill – even integrated GPUs in recent Intel/AMD processors handle it easily. An RTX 4090 will give near-instantaneous token generation but is unnecessary.
Apple Silicon: M1 or later (M4 Max included) runs the model in <100ms per utterance using Core ML or the Moonshine Python package.
Edge devices: Raspberry Pi 4/5 with 2 GB+ RAM can run the Q4_K_M variant at approximately 2–3× realtime (e.g., transcribing 10 seconds of audio in 3–4 seconds). For real-time streaming, the Tiny (26M) or Small (123M) variants are preferred.
CPU-only: Works on any modern x86 or ARM CPU – expect 1–4× realtime depending on hardware.

How It Compares

Explore the Provider

See all Useful Sensors models

Aggregate stats, leaderboard, release timeline, and benchmark coverage across every Useful Sensors model we track.

Open Useful Sensors

Explore the Family

See every Moonshine release

The full Moonshine family leaderboard with sizes, benchmark scores, and a release timeline.

Open Moonshine

Free Monthly Report

The AI Build Report

The state of AI models, API prices, and what to run where. New every month, free.