Troubleshooting
"I keep getting 401 even though my key is correct"
- Verify you sent it as
Authorization: Bearer esk_live_...(with theBearerprefix). - Verify the key isn't revoked (Dashboard > API Keys).
- Verify there's no whitespace (curl quoting can leak trailing newline).
- Test directly:
curl https://api.epithre.com/v1/models -H "Authorization: Bearer $KEY". This is a public endpoint that requires no auth, but doesn't reject valid Authorization headers; if you get a clean 200 with the model list, your URL + connectivity work.
"My streaming output arrives in chunks of 50+ tokens, not per-token"
A buffering proxy is in between. From the client side: not much you can do (the buffering is upstream). Try:
- Switch to a different network (mobile hotspot to test).
- Test directly against
api.epithre.com(not a proxy you set up). - If you're behind your own nginx proxy, set
proxy_buffering off;for the streaming route.
We set X-Accel-Buffering: no server-side; respected by most modern proxies.
"I see 429 backend_busy frequently"
This is shared-pool back-pressure across all Epithre customers, not your per-key limit.
- On
epithre-omni: as of May 2026, short bursts are queued for up to 45 seconds before returning 429. If you still seebackend_busy, the pool has been saturated for >45s straight — back off 5-10s with jitter. If this happens more than ~1% of requests, email us; the cap needs raising rather than retrying. - On
epithre-prme/epithre-lyt: no queueing — retry after 1-3s with jitter. - In all cases: set your HTTP client read timeout to at least 90s so the queue wait doesn't trip a client-side timeout. The OpenAI SDK default (10 minutes) is already safe.
"I see 429 concurrency_exceeded"
Your per-key concurrency cap (default 10) is hit. Either:
- Reduce client-side parallelism (semaphore).
- Raise the cap in the dashboard. Up to ~100 is supported per key.
"Latency is suddenly slow"
Possible causes:
- Long prompt: tokens grow superlinearly with context. A 50K-token prompt takes much longer than 5K.
- Long output: tokens-per-second of generation is ~30-50 on Omni. A 4000-token reply takes ~80-130s.
- Backend transient slowness: rare but happens during high load. Email us with your request ID; we can correlate.
tool_choice="required"slowness on long prompts (updated May 2026): the prior long-prompt stall is fixed across all backends."required"now works reliably onepithre-omniandepithre-prme. For low-latency paths,"auto"and named tool choice still have slightly lower TTFT.response_formatjson_schema on long prompt: works on all backends post-fix. Fall back tojson_objectonly if you hit a rare edge case at very long inputs (>10K tokens combined with deeply nested schema).
"JSON output is malformed or has finish_reason: length"
- Increase
max_tokens. - If using strict json_schema and the schema is very complex (deep nesting, many required fields), the model may run out of tokens before completing. Simplify schema or increase max_tokens.
- Fall back to
json_objectmode with a more forgiving validator on your side.
"Embeddings I created yesterday don't match similar embeddings I create today"
- Did you change the
instructionfield? Different instruction = different vector. - Did you switch between text and image input? Different modalities map to different (but cosine-comparable) regions of the space.
- If you embedded before 2026-05-14 (our VL migration), those vectors are in the old text-only space. Re-embed after the migration to match new vectors.
"Files API upload returns 413"
You're over a body-size or quota limit:
- Per file: 100 MB max
- Per user storage quota: 1 GB total
- Request body: 50 MB max on
/v1/files
Delete old files first via DELETE /v1/files/{id}, or raise the quota by emailing us.
"Batch is stuck in validating"
The worker checks every 5 seconds. If you create a batch and immediately fetch it, you'll see validating. Wait 5-10 seconds; should transition to in_progress.
If still validating after a minute, check the input file's lines for malformed JSON. The worker rejects the whole batch if it can't parse the input.
"Webhook isn't firing"
- Check
GET /v1/webhooks- confirm the webhook is registered andstatus: active. - Check
last_delivery_at- did any past deliveries succeed? - Check
last_error- errors are recorded here. - Verify your URL is HTTPS, returns 2xx within 10s.
- Verify your URL is publicly reachable (not behind VPN, no IP allowlist blocking us).
- If your URL is behind Cloudflare or similar, our requests come from Jakarta IPs.
"Cache hit rate is lower than expected"
- Confirm
cache_controlis on the message's content block, not the message itself. - Confirm prefix is >100 tokens (below this, cache marker is ignored).
- Check for differences in the prefix: user IDs, timestamps, session-specific data in the system prompt cause cache misses. Move those out of the cached portion.
"Iris keeps returning content_policy_violation"
epithre-iris has safety filters:
- CSAM keyword + LLM classifier
- Deepfake-of-public-figure filter
- General NSFW keyword filter
If you believe the rejection is a false positive on your legitimate use case, email us with the prompt. We refine the filters periodically.
"I'm reaching my Rp50,000 free credit, can I get more for testing?"
Email hello@epithre.com with subject "Extended trial". Include a description of your use case. We routinely grant additional credit for serious evaluation.
How to report a bug
Include:
X-Request-IDfrom response headers- HTTP status + error JSON
- Approximate timestamp UTC
- Your account email + API key prefix (not the full key)
- The smallest reproducing request body
Email: hello@epithre.com with subject "Bug report".