Whisper Model Sizes 2026: Tiny to Large V3 Turbo

Quick Answer

Tiny and Base

Fast and light, but often too error-prone for serious dictation.

Small

A practical minimum when memory is limited and you can tolerate some corrections.

Medium

A quality step up, but latency can interrupt real-time writing flow.

Large

High quality for files, heavy for interactive dictation.

Large V3 Turbo

A good first model to test for local dictation, if your app supports it.

Which Whisper Model Should You Choose for Dictation?

For dictation, the best Whisper model is the one that is accurate enough without breaking your flow. A three-second delay can slow writing more than one quick correction.

Spokenly Dictation Models settings showing local Whisper model choices, captured June 2026.

Real-time dictation

Large V3 Turbo, Small, or Medium

Pick the smallest model that keeps corrections low. Latency matters more than benchmark purity.

File transcription

Large V3 or Large V3 Turbo

Files can wait. Use a larger model when accuracy matters and processing time is acceptable.

Older Intel Mac

Small, Medium, or Turbo if it runs acceptably

Avoid assuming Large V3 will feel interactive. Test before making it your default.

Windows laptop

Small or Turbo

Thermals and memory pressure matter. A smaller model can produce a better writing flow.

English-only notes

small.en or medium.en

English-only models can be useful at smaller sizes. Large and Turbo are multilingual only.

Mixed-language dictation

Large V3 Turbo or Large V3

Use multilingual models when you switch languages or dictate names and phrases from multiple languages.

Spokenly includes local Whisper options for offline dictation and file transcription, plus cloud transcription through BYOK when you want to compare local latency against managed model quality. See the Parakeet vs Whisper comparison for the broader local-model tradeoff.

Official Whisper Model Sizes

Parameters, English-only availability, required VRAM, and relative speed are the cleanest baseline for comparing Whisper models. For dictation, those numbers matter because they affect latency, memory use, and how quickly text returns after you stop speaking.

Tiny

Parameters

39M

English

tiny.en

VRAM

~1 GB

Speed

~10x

Dictation fit

Fast tests, low accuracy ceiling

Base

Parameters

74M

English

base.en

VRAM

~1 GB

Speed

~7x

Dictation fit

Short notes on weak hardware

Small

Parameters

244M

English

small.en

VRAM

~2 GB

Speed

~4x

Dictation fit

Practical minimum for many users

Medium

Parameters

769M

English

medium.en

VRAM

~5 GB

Speed

~2x

Dictation fit

Better accuracy, noticeable latency

Large

Parameters

1550M

English

No .en model

VRAM

~10 GB

Speed

Dictation fit

High quality, heavy for live use

Large V3 Turbo

Parameters

809M

English

No .en model

VRAM

~6 GB

Speed

~8x

Dictation fit

Strong default where supported

Sources: OpenAI Whisper README and OpenAI Whisper Large V3 Turbo model card.

Why Whisper Size Numbers Differ

Whisper size numbers differ because people measure different artifacts: model parameters, runtime memory, PyTorch checkpoints, Hugging Face files, Core ML conversions, and quantized local files. Use parameters for model comparison and file size for disk planning.

Parameters

Parameters describe the model itself. Tiny has 39 million parameters; Large has about 1.55 billion. This is the most stable way to compare Whisper model sizes.

VRAM

VRAM is the memory needed to run the model on a GPU. OpenAI lists rough VRAM requirements, but CPU, Metal, CUDA, and app runtime choices can change the real number.

File size

File size depends on packaging. PyTorch checkpoints, Hugging Face files, Core ML conversions, GGML, GGUF, and quantized files can all show different MB or GB values for the same model family.

Quantization

Quantized local files shrink model size and memory use by storing weights with fewer bits. That can make Whisper usable on older machines, but quality and speed depend on the conversion.

For local apps, whisper.cpp model lists are useful because they show real local model artifacts rather than only research checkpoints. Check the whisper.cpp model directory when you need file sizes for GGML or GGUF-based runtimes.

Common whisper.cpp file sizes

tiny

Multilingual file

75 MiB

English-only file

75 MiB

Quantized example

No common q5 file listed in the core list

base

Multilingual file

142 MiB

English-only file

142 MiB

Quantized example

No common q5 file listed in the core list

small

Multilingual file

466 MiB

English-only file

466 MiB

Quantized example

No common q5 file listed in the core list

medium

Multilingual file

1.5 GiB

English-only file

1.5 GiB

Quantized example

No common q5 file listed in the core list

large-v3

Multilingual file

2.9 GiB

English-only file

No .en model

Quantized example

1.1 GiB for large-v3-q5_0

large-v3-turbo

Multilingual file

1.5 GiB

English-only file

No .en model

Quantized example

547 MiB for large-v3-turbo-q5_0

Best Whisper Model for File Transcription

For audio and video files, use a larger model when accuracy matters more than wait time. Large V3 Turbo is a strong default because it is fast; Large V3 can still make sense for important files where you are willing to wait and review.

File transcription also benefits from features outside the model: subtitle export, speaker labels, batch processing, and review UI. If you are converting MP3s, see the MP3 to text on Mac guide.

Older Macs and Windows Laptops

On older hardware, do not start with the largest model just because it has the best reputation. Thermal throttling, memory pressure, and slow CPU inference can make a smaller model more useful in daily writing.

Intel Macs

Try Small first, then Medium or Turbo if latency remains acceptable. Parakeet requires Apple Silicon in Spokenly, so Whisper is the local fallback on Intel Macs.

Windows PCs

Test battery, fan noise, and return time, not only transcript accuracy. A quick Small model can beat a slow Large model when you are writing in bursts.

Large V2 vs Large V3 vs Large V3 Turbo

Large V3 is the full-quality modern Whisper large model. Large V3 Turbo is an optimized model intended to keep much of that quality while running far faster. For interactive dictation, Turbo is usually the better first test. For file transcription, compare both on your actual audio.

The practical rule: choose Turbo when you need fast returns, choose Large V3 when you can wait, and choose a smaller model when memory or battery life becomes the bottleneck.

Whisper vs Parakeet vs Cloud Models

Whisper is not the only local speech model worth testing. Parakeet can be faster for supported languages on Apple Silicon, while cloud models can be stronger for accents, noisy audio, punctuation, and specialized vocabulary.

Whisper

Best broad local option, especially for multilingual coverage and Intel compatibility.

Parakeet

Fast local option on Apple Silicon, with V3 covering 25 languages in Spokenly.

Cloud models

Useful when accuracy, punctuation, or vocabulary matter more than offline processing.

For fully offline work, see Spokenly's Local Only Mode.

FAQ

What are the Whisper model sizes?

OpenAI lists Tiny at 39M parameters, Base at 74M, Small at 244M, Medium at 769M, Large at 1550M, and Large V3 Turbo at 809M. Required VRAM ranges from about 1 GB for Tiny and Base to about 10 GB for Large.

What is the best Whisper model for dictation?

For dictation, Large V3 Turbo is often a good first model to test when the app and hardware support it. It is much faster than the full Large model while keeping strong accuracy. On older hardware, Small or Medium may feel better because they return text faster.

Why do Whisper model sizes in MB differ across sites?

They may be measuring different artifacts: PyTorch checkpoints, Hugging Face safetensors, whisper.cpp GGML files, GGUF files, quantized versions, or app-packaged model bundles. Parameters are more stable than file size.

Is Whisper Large V3 Turbo better than Large V3?

Turbo is optimized for speed and is a strong default for real-time transcription. Large V3 can still be useful for highest quality file transcription, but it is heavier. For dictation, the faster model often produces a better user experience.

Can Whisper run offline?

Yes. Whisper models can run locally with no network connection when the app bundles or downloads the model. Spokenly includes local Whisper options for offline dictation and file transcription.

How much RAM does Whisper need?

It depends on model size, runtime, and hardware. OpenAI lists approximate VRAM from about 1 GB for Tiny and Base to 10 GB for Large. CPU and app runtimes can use different amounts of system memory.