Concurrent generations
Concurrent generations are synthesis jobs that are admitted or running. Queued generations are short background waits for admitted capacity. Studio and Developer API jobs share the same per-user pool. Text to speech, streaming text to speech, async text to speech, voice design, and voice cloning all count toward these limits. Voice design counts each requested preview as one generation. Whenpreview_count is omitted, Breeze generates one preview.
| Plan | Concurrent generations | Queued generations |
|---|---|---|
| Free | 3 | 0 |
| Starter | 6 | 2 |
| Creator | 10 | 3 |
| Pro | 20 | 5 |
429 GENERATION_CONCURRENCY_EXCEEDED. When Breeze shared generation capacity is temporarily unavailable, the API returns 503 GENERATION_CAPACITY_EXCEEDED.
Best practices
- Use exponential backoff with jitter for
429 GENERATION_CONCURRENCY_EXCEEDEDand transient5xxresponses. - Group text into natural requests instead of sending one word at a time.
- Treat async text to speech as a background job and poll for completion. Async delivery does not bypass concurrent generation limits.
- Cache responses when appropriate. Identical inputs are re-billed.
Design around limits
Pricing
See how plan limits, queued generations, and shared Studio/API credits fit together.
Text to speech
Use async jobs for longer text and retry only the failed segments of a batch.
Streaming
Choose streaming for lower time-to-first-byte without bypassing generation concurrency.
Errors
Handle
429, 503, and 504 responses with the right retry behavior.
