Rate limits

Two layers of limits apply to every request.

Per-key limits (yours to control)

Limit Default Where to change
Requests per minute (RPM) 60 Dashboard > Keys > Edit
Requests per day (RPD) 10,000 Dashboard > Keys > Edit
Concurrent requests 10 Dashboard > Keys > Edit
Monthly spend cap Rp1,000,000 Dashboard > Keys > Edit

Exceeding any of these returns HTTP 429 with a code indicating which limit was hit.

We can raise these in the dashboard at any time. For production workloads, typical settings:

Email hello@epithre.com if you need beyond 1000 RPM.

Backend capacity (shared)

Each chat model has an aggregate concurrent cap across all Epithre customers, sized to real serving capacity and reserving headroom for IsonAI internal services. Behavior on saturation differs per model:

Model On saturation
epithre-omni Queued for up to 45 seconds, then HTTP 429 backend_busy if no slot freed
epithre-prme Immediate HTTP 429 backend_busy (long-tail generations, queueing rarely helps)
epithre-lyt Immediate HTTP 429 backend_busy

Practical effect for Omni: short customer-side bursts (most completions take 10-30s) are absorbed transparently as a small first-byte latency increase instead of a 429. You should rarely see backend_busy on Omni unless saturation lasts longer than 45 seconds.

If you do see backend_busy regularly on Omni, email us; sustained saturation means the pool cap needs raising rather than queueing harder.

Header inspection

Currently we don't surface X-RateLimit-* headers in responses. The recommended pattern is:

  1. Get a 429 response.
  2. Check error.code: rpm_exceeded vs rpd_exceeded vs concurrency_exceeded vs backend_busy.
  3. Backoff per the recovery strategy:
  4. rpm_exceeded: short backoff (1-2s, then retry).
  5. rpd_exceeded: long backoff. Raise daily cap or wait until UTC midnight.
  6. concurrency_exceeded: reduce parallelism client-side.
  7. backend_busy: on Omni this means sustained saturation (the 45s queue already expired) — back off 5-10s. On PRME/LYT, retry after 1-3s.

Multiple keys

Create as many keys as you want from the dashboard. Every key on the same account draws from a single shared credit balance, and each key has its own independent RPM / RPD / concurrency / monthly cap. Useful for:

Keys are independent for revocation. If one leaks, revoke only that one - other keys keep working with the same balance.

See also