Agent-readable docs index: /llms.txt. Full docs in one file: /llms-full.txt. Download /docs.zip to grep all markdown files locally.

voice clone

Clone a voice from an audio clip. Returns a voice ID for use with 'egaki speech --voice <id>'. Supports Cartesia (default) and ElevenLabs.
Best practices for high-quality clones:
  1. Isolate vocals first: 'egaki demucs recording.mp3 --stems vocals' removes background music, noise, and other speakers.
  2. Find a clean snippet: 'egaki transcribe recording-vocals.mp3' to get word timestamps. Pick a 5-10s segment with a complete phrase, clear speech, no hesitations or crosstalk.
  3. Trim to speech boundaries: ffmpeg -i recording-vocals.mp3 -ss 12.5 -to 22.0 -c copy clip.mp3 No silence padding at start or end.
  4. Match energy to intent: the clone mimics the tone and pacing of the source clip. Use an energetic clip for energetic output.
  5. Speak in the target language. Use --language for Cartesia clones.
Cartesia: up to 10s of audio, instant, free. Good for short clean clips. ElevenLabs: 1-3 min recommended, has --remove-background-noise option.

Usage

egaki voice clone [audio]

Arguments

ArgumentRequiredDescription
[audio]Noaudio

Options

OptionDefaultDescription
--name [name]-Name for the cloned voice (required)
-p, --provider [provider]cartesiaVoice cloning provider: cartesia or elevenlabs
--language [lang]-Cartesia only: ISO 639-1 language code (default: en). E.g. en, es, fr, de, ja
--description [text]-Optional description for the voice
--base-voice-id [id]-Cartesia: optional base voice ID to derive from
--remove-background-noise-ElevenLabs: apply AI noise removal to the clip before cloning
--stdin-Read audio from stdin instead of a file path
--json-Output result as JSON to stdout

Global Options

OptionDefaultDescription
-h, --help-Display this message
-v, --version-Display version number

Examples

# Clone a voice from a recording
egaki voice clone recording.wav --name "My Voice"
# Clone with ElevenLabs and noise removal
egaki voice clone vocals.mp3 --name "Narrator" --provider elevenlabs --remove-background-noise
# Full pipeline: separate → trim → clone
egaki demucs interview.mp3 --stems vocals
ffmpeg -i interview-vocals.mp3 -ss 5.0 -to 15.0 -c copy clip.mp3
egaki voice clone clip.mp3 --name "Speaker"
# Use the cloned voice
egaki speech "Hello world" --voice <voice-id>