Rate limits - Breeze Blue

Concurrent generations

Concurrent generations are synthesis jobs that are admitted or running. Queued generations are short background waits for admitted capacity. Studio and Developer API jobs share the same per-user pool.

Text to speech, streaming text to speech, async text to speech, voice design, and voice cloning all count toward these limits. Voice design counts each requested preview as one generation. When preview_count is omitted, Breeze generates one preview.

Plan	Concurrent generations	Queued generations
Free	3	0
Starter	6	2
Creator	10	3
Pro	20	5

When both admitted and queued generation capacity are full, the API returns 429 GENERATION_CONCURRENCY_EXCEEDED. When Breeze shared generation capacity is temporarily unavailable, the API returns 503 GENERATION_CAPACITY_EXCEEDED.

Best practices

Use exponential backoff with jitter for 429 GENERATION_CONCURRENCY_EXCEEDED and transient 5xx responses.

Group text into natural requests instead of sending one word at a time.

Treat async text to speech as a background job and poll for completion. Async delivery does not bypass concurrent generation limits.

Cache responses when appropriate. Identical inputs are re-billed.

Design around limits

Pricing

See how plan limits, queued generations, and shared Studio/API credits fit together.

Text to speech

Use async jobs for longer text and retry only the failed segments of a batch.

Streaming

Choose streaming for lower time-to-first-byte without bypassing generation concurrency.

Errors

Handle 429, 503, and 504 responses with the right retry behavior.

​Concurrent generations

​Best practices

​Design around limits