Codex Voice Mode: Voice Input for OpenAI Codex CLI

Q: How do I add voice input to Codex CLI?

Download Spokenly from spokenly.app/download, then run: codex mcp add spokenly --url http://localhost:51089. Add a voice instruction to ~/.codex/AGENTS.md and restart Codex.

Spokenly is a voice dictation app for macOS and iOS. Press a shortcut, speak, and text appears at the cursor in any app. It also connects to Codex CLI via MCP, so the agent can ask you questions and get voice answers directly.

Updated June 2026

Download Spokenly

Does the Codex CLI Have a Voice Mode?

Yes, in two senses. The Codex CLI ships with built-in dictation: press the spacebar in the prompt and you can speak your input instead of typing it. That is Codex's native voice mode, and for dictating a single prompt it works fine.

The limit is that this voice mode only covers the prompt box. The moment Codex asks a follow-up question mid-task, or you want to dictate anywhere else on your Mac, the built-in mode runs out. Spokenly fills that gap: it adds full voice input and dictation for the Codex CLI through MCP, so the agent can ask questions and you answer by voice, and the same shortcut dictates in every other app. The rest of this guide walks the setup.

Codex Has Built-in Dictation. Why Spokenly?

Codex CLI has built-in dictation: press spacebar to dictate prompts directly in the terminal. It works, but it only covers one scenario: dictating your initial input to Codex.

Spokenly is a full dictation app that works in any text field on your Mac and iPhone. It also integrates with Codex via MCP: the agent calls Spokenly when it needs your input during its workflow. You see the specific question, speak your answer, and it goes back as structured context. This happens automatically, not just when you press a key.

How It Works

Spokenly runs a local MCP server at localhost:51089. Register it with one command and Codex gains a voice dictation tool. Instead of printing a question and waiting for you to type, the agent calls the tool directly.

You see the agent's question in Spokenly's overlay, speak your answer, and press Enter. The transcribed text goes straight back to Codex. There's no context switch. You stay focused on the problem instead of dropping into a text editor to type a reply.

Codex uses HTTP MCP transport, connecting directly to Spokenly's local server. No bridge scripts, no special configuration beyond the initial setup command.

What You Get

Natural voice responses

Codex asks questions, you answer by speaking. Give detailed architecture decisions, explain business logic, or describe bugs. Voice captures nuance that short typed answers miss.

HTTP MCP connection

Codex connects to Spokenly via HTTP MCP at localhost:51089. No bridge scripts, no wrapper processes. Just a standard URL.

Instant setup

One command to register, one line in AGENTS.md. Under a minute from install to first voice interaction.

Local models available

Switch to local Whisper or Parakeet and your voice never leaves the device. All transcription stays on your Mac. No audio sent to any server.

Quick Setup

Download the sideload version from spokenly.app/download and launch it. On first launch, the app will offer to set up the Codex integration automatically with a single terminal command.

Or set up manually:

1
Register the MCP tool
Run in Terminal:
codex mcp add spokenly --url http://localhost:51089
2
Add instruction to AGENTS.md
Add this line to ~/.codex/AGENTS.md:
ALWAYS ask questions via the ask_user_dictation tool from the spokenly MCP server, never as plain text.
3
Restart Codex
Codex picks up the voice tool on restart. Test with: "Ask me 3 questions".

See the full setup guide for troubleshooting.

Real Workflow Examples

Task planning: You tell Codex to interview you before writing any code. The agent asks question after question: expected behavior, error states, data formats, dependencies. You answer each one by voice in seconds. The agent builds a complete brief from your answers. Ten minutes of talking instead of an hour of writing.

Refactoring: Codex asks "Should I extract the validation logic into a shared util or keep it inline?" You explain which validators are reused, where the edge cases differ, and why one module should stay self-contained. A quick voice answer beats a terse "extract it".

Long context: Codex asks "How should the sync process handle conflicts between local and remote data?" Instead of typing a few words and moving on, you spend 30 seconds explaining the merge strategy, priority rules, and what the user should see. The agent gets the full picture.

Spokenly vs Codex Built-in Dictation

Feature	Spokenly	Codex built-in dictation
Manual dictation	Yes, keyboard shortcut in any app	Yes, press spacebar in Codex
Agent-initiated Q&A	Yes, agent calls voice tool via MCP	No
Works outside Codex	Yes, any app on your Mac	No
Local/offline models	Yes (Whisper, Parakeet)	No
Custom AI prompts	Yes	No
iOS app	Yes	No
Price	Free (local + own API keys)	Included with Codex

You can use both together. Codex built-in dictation for prompts, Spokenly for agent-initiated questions and dictation everywhere else on your Mac.

Frequently Asked Questions

How do I add voice input to Codex CLI?

Download Spokenly from spokenly.app/download, then run: codex mcp add spokenly --url http://localhost:51089. Add a voice instruction to ~/.codex/AGENTS.md and restart Codex.

Is Spokenly free to use with Codex?

Yes. The MCP server and local speech-to-text models (Whisper, Parakeet) are free. Cloud models are available via Pro plan or with your own API keys. No subscription required for local models.

Does Codex have MCP timeout issues like Claude Code?

No. Codex uses HTTP MCP, which keeps the connection open reliably. There's no timeout limit on voice sessions.

Can I use Spokenly with Codex and Claude Code at the same time?

Yes. Spokenly's MCP server handles one recording session at a time, but you can have it registered with multiple tools. When one agent calls the dictation tool, you respond, and the result goes back to the correct agent.

What transcription models does Spokenly support?

Local: Whisper and Parakeet on Apple Silicon. Cloud: GPT-4o Transcribe, Deepgram Nova, Groq Whisper via Pro plan or your own API keys. You choose per-session.

Does my voice data stay private?

With local models, all speech processing happens on your Mac. Audio never leaves your device. With cloud models, audio reaches the chosen provider: directly with BYOK, or proxied through Spokenly's backend with Pro. Nothing is stored on our servers.

Does the Codex CLI have a voice mode?

Yes. Codex has built-in dictation: press spacebar in the prompt to speak instead of type, which covers prompts only. Spokenly adds voice input across your whole Mac and lets Codex ask you questions by voice through MCP, so you are not limited to the prompt box.

Voice Input for Other Tools

Claude Code Claude Cowork Cursor All integrations

Talk to Codex

Set up in under a minute. Free with local speech-to-text models.

Download Spokenly

Free MCP server

Local models included

Works offline