Agent-readable docs index: /llms.txt. Full docs in one file: /llms-full.txt. Download /docs.zip to grep all markdown files locally.

speech

Generate speech audio from text using AI text-to-speech models. Supports OpenAI (tts-1, tts-1-hd, gpt-4o-mini-tts), ElevenLabs (eleven_v3, eleven_multilingual_v2, eleven_flash_v2_5), and Cartesia (sonic-3.5, sonic-3). Uses your provider API key directly (egaki subscription does not cover speech yet).

Usage

egaki speech [text]

Arguments

ArgumentRequiredDescription
[text]Notext

Options

OptionDefaultDescription
-m, --model [model]-Speech model ID. If omitted, shows an interactive picker (or uses default in non-TTY mode)
-o, --output [path]egaki-outputOutput file path (extension added from audio format if omitted)
--voice [voice]-Voice ID or name (provider-specific). OpenAI: alloy, echo, nova, etc. ElevenLabs/Cartesia: voice ID from their library
--output-format [format]-Audio output format: mp3, wav, pcm, opus, aac, flac (provider support varies)
--speed [speed]-Playback speed multiplier (OpenAI: 0.25-4.0, Cartesia: 0.6-1.5)
--instructions [text]-Style instructions for speech (only gpt-4o-mini-tts and ElevenLabs). E.g. "Speak in a calm, soothing tone"
--language [lang]-ISO 639-1 language code (e.g. en, es, fr). ElevenLabs and Cartesia
--stdin-Read text from stdin instead of the positional argument
--json-Output result metadata as JSON to stdout (model, file path, cost)
--stdout-Write raw audio bytes to stdout instead of saving to a file. Useful for piping

Global Options

OptionDefaultDescription
-h, --help-Display this message
-v, --version-Display version number

Examples

# Generate speech with default model
egaki speech "Hello, welcome to egaki!"
# Use a specific model and voice
egaki speech "Good morning" -m tts-1-hd --voice nova
# Use GPT-4o Mini TTS with style instructions
egaki speech "Breaking news today" -m gpt-4o-mini-tts --instructions "Speak like a news anchor"
# ElevenLabs with a voice ID
egaki speech "Hello world" -m eleven_v3 --voice 21m00Tcm4TlvDq8ikWAM
# Read text from stdin
cat script.txt | egaki speech --stdin -o narration.mp3
# Cartesia Sonic 3.5 with a voice ID
egaki speech "Hello world" -m sonic-3.5 --voice f786b574-daa5-4673-aa0c-cbe3e8534c02
# Pipe audio to another tool
egaki speech "test" --stdout | ffplay -nodisp -autoexit -