Modes

Spokenly Modes list with shortcuts, app triggers, and agentic actions

A mode is a saved dictation profile with its own AI Instructions, transcription model, AI provider, and output style, and you switch between them on the fly. There is always one main mode (the default) plus any number of additional modes you create; switching modes swaps the whole setup at once.

What you can set per mode

Name to tell modes apart.
AI Instructions (optional): the instructions applied to your dictation. Translation, cleanup, and formatting are all driven from here; there is no separate language dropdown.
Transcription model: GPT-4o Transcribe, Deepgram Nova, Soniox, NVIDIA Parakeet, Whisper, or Apple Speech Analyzer.
AI provider (and fallback): Spokenly built-in, OpenAI, Anthropic (Claude), or any OpenAI-compatible provider via your own base URL and key (BYOK). With BYOK, GPT-5 gives the best quality, or gpt-oss-120b if speed matters.
Output action: auto-insert, paste and press Enter, copy to clipboard, or save to history only.
Agentic actions like Google Search, enabled per mode. Agentic actions are macOS only. See Agentic Actions.
Included context sent to the AI: clipboard, focused app, text around the cursor, active browser URL.

Switching modes

Switch with any one of these:

Keyboard shortcut (macOS): give a mode its own shortcut with an activation style (automatic, toggle, push-to-talk, or double-tap).
Modes list or overlay picker (macOS): pick a mode while dictating.
Auto-activation (macOS): let a mode turn on by itself for a specific app or website.
Trigger words: start a dictation with a mode's trigger word to run that mode on just that dictation. The word itself is dropped from the output: "email hey John, running late" runs the Email mode on "Hey John, running late".
Mode picker (iOS): switch between modes on your phone.

If several apply at once: a trigger word wins for that dictation, then a mode's own shortcut, then website match, then app match, then the main mode. The one exception: dictation started with an explicitly requested mode (deeplinks, Shortcuts, background dictation with a chosen prompt) ignores trigger words.

Use cases

A few ready-made setups. Each is just a mode with the right model, output action, and AI Instructions.

What you can set per mode

Switching modes

Use cases

FAQ

On this page

Modes

What you can set per mode

Advanced

Switching modes

Use cases

Translate as you speak

Clean up your dictation

Turn spoken punctuation into symbols

Automate with Google Search

FAQ

Can a mode use my own API key (BYOK)?

Can a mode change the transcription model?

Do I need Pro to use modes?

On this page