text and instructions on the preview. Speaker consent is required; see voice consent.
Recording requirements
- Upload exactly one sample file.
- Use a single speaker with no background music or noise.
- Use MP3 or WAV.
- Maximum duration: 30 seconds.
- Maximum file size: 5 MB.
Step 1: Create a preview
generated_voice_id, not a saved voice_id. text defaults to the Breeze clone script when omitted or empty. instructions defaults to none. SDKs also accept files with exactly one item.
Step 2: Stream the preview
Step 3: Save the preview as a voice
POST /v1/text-to-speech/{voice_id} calls use the original uploaded sample and transcript, not the preview audio.
Editing or deleting a saved voice
- Update labels and the description with
PATCH /v1/voices/{voice_id}. - Tune defaults with
PATCH /v1/voices/{voice_id}/settings. - Remove the voice with
DELETE /v1/voices/{voice_id}.
Continue building
Voices
Learn how cloned voices, designed voices, public voices, and voice settings fit together.
Text to speech
Use the saved cloned voice for dialogue lines, previews, or production audio.
Managing history
Download generated audio, replay previous requests, and clean up test runs.
Rate limits
Plan clone and generation retries around shared Studio/API concurrency limits.

