How the limit works
Each plan gets a sustained request rate — a number of requests per second — with a short burst allowance, so a brief spike is not punished. The limit is measured per organization, not per API key, so issuing more keys does not raise it. Reads and writes draw on the same budget.
The exact numbers depend on your plan and can change, so they live in one place rather than being restated here: the pricing page lists the rate for each tier, and your dashboard shows the rate and burst in effect for your team right now.
When you go over
Past the rate, the API replies 429 Too Many Requests. Every response carries standard headers, so you always know where you stand:
| Header | Meaning |
|---|---|
Retry-After | Seconds to wait before trying again. Honour this first. |
RateLimit-Limit | Your burst capacity — the most requests allowed at once. |
RateLimit-Remaining | Requests left in the current window. |
RateLimit-Reset | Seconds until the budget refills to full. |
A 429 from your plan cap also carries RateLimit-Scope: tenant, which tells it apart from a coarse anti-flood limit at the edge. You may occasionally see 503 Service Unavailable with a short Retry-After when the platform is momentarily at capacity — handle it the same way.
Handling it
The rule is simple: on a 429, wait for the Retry-After value, then retry — and back off exponentially if it keeps coming. The SDKs surface the HTTP status and headers on a structured error, so you can branch on a throttle without parsing messages.
Rate limiting only throttles the cloud mirror. Your agent’s local signed chain is never refused by a 429 — receipts keep being written locally and stay verifiable offline. The throttle only delays mirroring them to the cloud; it never blocks the action or breaks the chain.
