Troubleshooting

"I keep getting 401 even though my key is correct"

Verify you sent it as Authorization: Bearer esk_live_... (with the Bearer prefix).
Verify the key isn't revoked (Dashboard > API Keys).
Verify there's no whitespace (curl quoting can leak trailing newline).
Test directly: curl https://api.epithre.com/v1/models -H "Authorization: Bearer $KEY". This is a public endpoint that requires no auth, but doesn't reject valid Authorization headers; if you get a clean 200 with the model list, your URL + connectivity work.

"My streaming output arrives in chunks of 50+ tokens, not per-token"

A buffering proxy is in between. From the client side: not much you can do (the buffering is upstream). Try:

Switch to a different network (mobile hotspot to test).
Test directly against api.epithre.com (not a proxy you set up).
If you're behind your own nginx proxy, set proxy_buffering off; for the streaming route.

We set X-Accel-Buffering: no server-side; respected by most modern proxies.

"I see 429 backend_busy frequently"

This is shared-pool back-pressure across all Epithre customers, not your per-key limit.

On epithre-omni: as of May 2026, short bursts are queued for up to 45 seconds before returning 429. If you still see backend_busy, the pool has been saturated for >45s straight — back off 5-10s with jitter. If this happens more than ~1% of requests, email us; the cap needs raising rather than retrying.
On epithre-prme / epithre-lyt: no queueing — retry after 1-3s with jitter.
In all cases: set your HTTP client read timeout to at least 90s so the queue wait doesn't trip a client-side timeout. The OpenAI SDK default (10 minutes) is already safe.

"I see 429 concurrency_exceeded"

Your per-key concurrency cap (default 10) is hit. Either:

Reduce client-side parallelism (semaphore).
Raise the cap in the dashboard. Up to ~100 is supported per key.

"Latency is suddenly slow"

Possible causes:

Long prompt: tokens grow superlinearly with context. A 50K-token prompt takes much longer than 5K.
Long output: tokens-per-second of generation is ~30-50 on Omni. A 4000-token reply takes ~80-130s.
Backend transient slowness: rare but happens during high load. Email us with your request ID; we can correlate.
tool_choice="required" slowness on long prompts (updated May 2026): the prior long-prompt stall is fixed across all backends. "required" now works reliably on epithre-omni and epithre-prme. For low-latency paths, "auto" and named tool choice still have slightly lower TTFT.
response_format json_schema on long prompt: works on all backends post-fix. Fall back to json_object only if you hit a rare edge case at very long inputs (>10K tokens combined with deeply nested schema).

"JSON output is malformed or has `finish_reason: length`"

Increase max_tokens.
If using strict json_schema and the schema is very complex (deep nesting, many required fields), the model may run out of tokens before completing. Simplify schema or increase max_tokens.
Fall back to json_object mode with a more forgiving validator on your side.

"Embeddings I created yesterday don't match similar embeddings I create today"

Did you change the instruction field? Different instruction = different vector.
Did you switch between text and image input? Different modalities map to different (but cosine-comparable) regions of the space.
If you embedded before 2026-05-14 (our VL migration), those vectors are in the old text-only space. Re-embed after the migration to match new vectors.

"Files API upload returns 413"

You're over a body-size or quota limit:

Per file: 100 MB max
Per user storage quota: 1 GB total
Request body: 50 MB max on /v1/files

Delete old files first via DELETE /v1/files/{id}, or raise the quota by emailing us.

"Batch is stuck in `validating`"

The worker checks every 5 seconds. If you create a batch and immediately fetch it, you'll see validating. Wait 5-10 seconds; should transition to in_progress.

If still validating after a minute, check the input file's lines for malformed JSON. The worker rejects the whole batch if it can't parse the input.

"Webhook isn't firing"

Check GET /v1/webhooks - confirm the webhook is registered and status: active.
Check last_delivery_at - did any past deliveries succeed?
Check last_error - errors are recorded here.
Verify your URL is HTTPS, returns 2xx within 10s.
Verify your URL is publicly reachable (not behind VPN, no IP allowlist blocking us).
If your URL is behind Cloudflare or similar, our requests come from Jakarta IPs.

"Cache hit rate is lower than expected"

Confirm cache_control is on the message's content block, not the message itself.
Confirm prefix is >100 tokens (below this, cache marker is ignored).
Check for differences in the prefix: user IDs, timestamps, session-specific data in the system prompt cause cache misses. Move those out of the cached portion.

"Iris keeps returning content_policy_violation"

epithre-iris has safety filters:

CSAM keyword + LLM classifier
Deepfake-of-public-figure filter
General NSFW keyword filter

If you believe the rejection is a false positive on your legitimate use case, email us with the prompt. We refine the filters periodically.

"I'm reaching my Rp50,000 free credit, can I get more for testing?"

Email hello@epithre.com with subject "Extended trial". Include a description of your use case. We routinely grant additional credit for serious evaluation.

How to report a bug

Include:

X-Request-ID from response headers
HTTP status + error JSON
Approximate timestamp UTC
Your account email + API key prefix (not the full key)
The smallest reproducing request body

Email: hello@epithre.com with subject "Bug report".