> ## Documentation Index
> Fetch the complete documentation index at: https://docs.breezeblue.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate limits

> Concurrent generation limits for Breeze API plans.

Breeze applies plan-based concurrent generation limits to protect long-running synthesis capacity. Standard API requests are not currently throttled by a published request-per-minute quota.

## Concurrent generations

Concurrent generations are synthesis jobs that are admitted or running. Queued generations are short background waits for admitted capacity. Studio and Developer API jobs share the same per-user pool.

Text to speech, streaming text to speech, async text to speech, voice design, and voice cloning all count toward these limits. Voice design counts each requested preview as one generation. When `preview_count` is omitted, Breeze generates one preview.

| Plan    | Concurrent generations | Queued generations |
| ------- | ---------------------: | -----------------: |
| Free    |                      3 |                  0 |
| Starter |                      6 |                  2 |
| Creator |                     10 |                  3 |
| Pro     |                     20 |                  5 |

When both admitted and queued generation capacity are full, the API returns `429 GENERATION_CONCURRENCY_EXCEEDED`. When Breeze shared generation capacity is temporarily unavailable, the API returns `503 GENERATION_CAPACITY_EXCEEDED`.

## Best practices

* Use exponential backoff with jitter for `429 GENERATION_CONCURRENCY_EXCEEDED` and transient `5xx` responses.
* Group text into natural requests instead of sending one word at a time.
* Treat async text to speech as a background job and poll for completion. Async delivery does not bypass concurrent generation limits.
* Cache responses when appropriate. Identical inputs are re-billed.

## Design around limits

<Columns cols={2}>
  <Card title="Pricing" icon="credit-card" href="/concepts/pricing">
    See how plan limits, queued generations, and shared Studio/API credits fit together.
  </Card>

  <Card title="Text to speech" icon="mic" href="/guides/text-to-speech">
    Use async jobs for longer text and retry only the failed segments of a batch.
  </Card>

  <Card title="Streaming" icon="radio" href="/concepts/streaming">
    Choose streaming for lower time-to-first-byte without bypassing generation concurrency.
  </Card>

  <Card title="Errors" icon="triangle-alert" href="/reference/errors">
    Handle `429`, `503`, and `504` responses with the right retry behavior.
  </Card>
</Columns>
