output_format query parameter selects the audio encoding. Non-streaming text to speech accepts mp3, wav, flac, pcm, aac, and opus. Streaming supports mp3 and pcm.
Supported encodings
| Value | Format | Notes |
|---|---|---|
mp3 | MP3 | Default. Best for general playback. |
wav | WAV | Uncompressed, lossless. |
flac | FLAC | Lossless compression. |
pcm | Raw PCM | Linear PCM for audio pipelines. |
aac | AAC | Efficient lossy codec. |
opus | Opus | Non-streaming only. |
Choosing a format
- End-user playback in a browser or app: prefer
mp3. - Real-time applications: use the streaming endpoint with
mp3orpcm. - Archival or post-processing: use
wavorflacfor lossless output.
Use formats with
SDK quickstart
Generate and save your first MP3 with the Python or TypeScript SDK.
Text to speech
Pass
output_format on sync, async, and streaming generation requests.Streaming
Use streaming-compatible formats for lower-latency playback.
CLI text to speech
Save generated audio from the command line while prototyping voices and formats.

