What Hardware Runs PersonaPlex 7B | AI OS Speech to Speech

Quantization Options

See how different quantization levels affect VRAM requirements and quality for this model.

Format	VRAM Required	Quality
Q2_K	3.3 GB	Low	Aggressive quantization — smallest size, noticeable quality loss
Q4_K_MRecommended	4.8 GB	Good	Best balance of size and quality for most use-cases
Q5_K_M	5.5 GB	Very Good	Slightly better quality than Q4 with moderate size increase
Q6_K	6.3 GB	Excellent	Near-lossless quality with manageable size
Q8_0	8.1 GB	Near Perfect	Virtually indistinguishable from full precision
FP16	14.7 GB	Full	Full 16-bit floating point — maximum quality, largest size

Hardware Compatibility

See which devices can run this model and at what quality level.

Hide F tierOnly featured devices

102 devices


AMD Radeon RX 7600 8GBAMD	SS	48.4 tok/s	4.8 GB
NVIDIA GeForce RTX 4060NVIDIA	SS	45.7 tok/s	4.8 GB
NVIDIA GeForce RTX 5060 Ti 8GBNVIDIA	SS	75.3 tok/s	4.8 GB
AMD Radeon RX 7700 XTAMD	SS	72.6 tok/s	4.8 GB
Intel Arc B580Intel	SS	76.6 tok/s	4.8 GB
NVIDIA GeForce RTX 4070NVIDIA	SS	84.7 tok/s	4.8 GB
NVIDIA GeForce RTX 4070 SUPERNVIDIA	SS	84.7 tok/s	4.8 GB
NVIDIA GeForce RTX 5070NVIDIA	SS	112.9 tok/s	4.8 GB
ACEMAGIC M1A Pro (i9-13900HK + ARC A770)ACEMAGIC	AA	86.1 tok/s	4.8 GB
AMD Radeon RX 7800 XTAMD	AA	104.9 tok/s	4.8 GB
AMD Radeon RX 9070AMD	AA	107.6 tok/s	4.8 GB
AMD Radeon RX 9070 XTAMD	AA	107.6 tok/s	4.8 GB
Google Cloud TPU v5eGoogle	AA	137.7 tok/s	4.8 GB
Intel Arc A770 16GBIntel	AA	94.1 tok/s	4.8 GB
NOVATECH AI Workstation (i9-14900K + RTX 5080)NOVATECH	AA	161.4 tok/s	4.8 GB
NVIDIA GeForce RTX 4060 Ti 16GBNVIDIA	AA	48.4 tok/s	4.8 GB
NVIDIA GeForce RTX 4070 Ti SUPERNVIDIA	AA	112.9 tok/s	4.8 GB
NVIDIA GeForce RTX 4080 SUPERNVIDIA	AA	123.7 tok/s	4.8 GB
NVIDIA GeForce RTX 5060 Ti 16GBNVIDIA	AA	75.3 tok/s	4.8 GB
NVIDIA GeForce RTX 5070 TiNVIDIA	AA	150.6 tok/s	4.8 GB
NVIDIA GeForce RTX 5080 Founders EditionNVIDIA	AA	161.4 tok/s	4.8 GB
AMD Radeon RX 7900 XTAMD	AA	134.5 tok/s	4.8 GB
AMD Radeon RX 7900 XTXAMD	AA	161.4 tok/s	4.8 GB
NVIDIA GeForce RTX 3090NVIDIA	AA	157.3 tok/s	4.8 GB
NVIDIA GeForce RTX 4090 Founders EditionNVIDIA	AA	169.4 tok/s	4.8 GB

Rows per page

Page 1 of 5

Run Locally vs API

Energy cost on AMD Radeon RX 7600 8GB (~48 tok/s, Q4_K_M) vs flagship API pricing.

Source	Cost per 1M tokens
Local (energy only)PersonaPlex 7B on AMD Radeon RX 7600 8GB · ~48 tok/s · 165W	$0.114
GPT-5.5OpenAI · in $5.00 · out $30.00	$12.50
Claude Opus 4.7 ThinkingAnthropic · in $5.00 · out $25.00	$11.00
Gemini 3.5 FlashGoogle · in $1.50 · out $9.00	$3.75
Grok 4.3xAI · in $1.25 · out $2.50	$1.63

API prices blended at 70% input / 30% output.

Hardware amortisation not included. Run the full ROI calculator for payback math.

Run the full ROI calculator

Rent in the Cloud

Cheapest current cloud rentals with at least 5 GB VRAM, refreshed hourly.

Option	Cost / GPU-hour
NVIDIA GeForce RTX 5060 TiVast.ai · Spot · 16 GB VRAM	$0.07
NVIDIA GeForce RTX 5070Vast.ai · Spot · 12 GB VRAM	$0.07
NVIDIA GeForce RTX 5060 Ti

PersonaPlex 7B

Our Take

Capabilities

Model Specifications

Quick Start

Download from Hugging Face

License

Performance & Scoring

Benchmarks

MBA Open Score

Quantization Options

Hardware Compatibility

Run Locally vs API

Rent in the Cloud

About This Model

Architecture & Technical Details

Related Models

Nemotron 3 Ultra

Nvidia Nemotron 3 Super

Nemotron 3 Nano Omni

See all NVIDIA models

The AI Build Report

Community

Capabilities & Use Cases

Running PersonaPlex 7B Locally

Hardware Requirements

Performance

Quick Start with Ollama

Installation from Source

How It Compares