Changelog
2026-05-25
epithre-omnireliability fix for high-concurrency agentic workloads. Fixed a backend-pool accounting bug where an abruptly-disconnected streaming request could fail to release its slot — over time this leaked slots and caused spuriousbackend_busyrejections even when the backend was free. Separately, the upstream timeout for theomnitier was raised so long-running agentic chains (large accumulated context legitimately exceeds the old limit under concurrency) are no longer cut off mid-stream. Net effect: heavy concurrent bursts now degrade gracefully — excess requests receive a clean, retryable429 backend_busyinstead of dropped connections. Client guidance for agentic / large-context workloads: set your HTTP read timeout to ≥240s, keep client concurrency conservative (large-context calls are throughput-intensive — more in-flight mostly adds queueing, not throughput), and retry on429 backend_busywith a short backoff. See Rate limits.
2026-05-22
epithre-omnibackend capacity expanded further. Aggregate concurrent cap raised again following sustained stress validation — ~50% more concurrent headroom on top of the May 17 increase. Bursty workloads previously bottlenecked at peak are now absorbed transparently. Combined with the May 20 queue grace, short bursts very rarely touch the cap. No client change needed.
2026-05-21 — Currency migration: USD → IDR
- All pricing now in IDR (Indonesian Rupiah). Existing balances converted at 17,600 IDR/USD on this date.
- API field rename:
cost_usd,amount_usd,monthly_usd_cap,credit_balance_usd,input_per_mtok_usd,output_per_mtok_usd,per_unit_usd→ corresponding*_idrfields. Webhook batch payload fieldtotal_cost_usd→total_cost_idr. - Tier B premium pricing applied (audit-validated against May 2026 market — Anthropic/OpenAI/Together AI/Replicate). See Pricing.
- Signup credit: Rp50,000 (was $5).
- Default monthly cap per key: Rp1,000,000 (was $100).
2026-05-20
epithre-omnibackend queue. When the shared Omni pool is saturated, requests now wait up to 45 seconds for a slot to free before returningHTTP 429 backend_busy— instead of rejecting immediately. Short bursts that previously triggered 429 spam are absorbed as a small first-byte latency increase. PRME and LYT behavior unchanged (still immediate 429). No client change needed; the response shape and code on actual rejection are identical. Make sure your HTTP client read timeout is at least 90s. See Rate limits and Troubleshooting.
2026-05-19
tool_choice="required"reliability fix onepithre-omni. Prior long-prompt stall (>5K tokens combined with strict tool-call enforcement) is resolved. Bare"required", named tool choice, and"auto"all work reliably across the full prompt-length range.response_format: json_schemastrict mode benefits from the same fix. No client change needed.
2026-05-17
epithre-omnibackend capacity expanded. Aggregate concurrent cap raised significantly on the flagship tier.429 backend_busyrates drop for bursty workloads. No client change needed./v1/rerankdefensive auto-truncate. Documents longer than 6000 characters are now server-side clamped to fit the reranker's input window, instead of returning422. Mirrors the auto-truncate behavior already on/v1/embeddings. Transparent — no response shape change.
2026-05-15
- Credit balance moved to account level. Single pool per account, drained by every key. Topping up no longer requires picking a key. Matches Anthropic / OpenAI / DeepSeek pattern. Existing balances were merged: sum of your active keys' previous balances is now your account balance. No action needed.
monthly_idr_capper key now enforced (was cosmetic). When a key hits its monthly IDR cap, it returnsHTTP 402 monthly_cap_exceeded. Other keys keep working. Default cap Rp1,000,000. Raise to 0 for no per-key cap.- Admin dashboard: per-key Edit button (rpm / rpd / concurrency / monthly cap); credit Top-up moved to user-level button.
/admin/system-healthnow reports embed and rerank as separate rows.
2026-05-14
- Multi-page docs (this site). Split from single-page to per-topic structure.
/v1/retrievalendpoint. Upload knowledge files (PDF/TXT/MD), search via cosine over chunked corpus. Turnkey RAG.- Files API extended with
purpose=knowledgefor retrieval ingest. - Knowledge processor async worker: extracts text, chunks recursively, embeds, indexes.
- Embed model upgraded to a multimodal backbone. Text and image vectors now share a 4000-dim space. Cross-modal retrieval enabled.
- Image embed exposed via
/v1/embeddingswith{"type": "image", ...}input items. Mixed batch supported. - Auto-truncate on
/v1/embeddings: text >10K chars auto-clamped to head. Opt out viatruncate: "NONE". - Prompt caching with explicit
cache_controlmarkers. 1.25x write, 0.1x read. - Batch API at 50% off.
- Webhooks for batch terminal events. HMAC-signed, exponential backoff retry.
- Structured output via
response_formatjson_schema strict mode. - Light-theme docs + dashboard redesign.
- Legal pages v1.1: full retention table, UU 27/2022 PDP reference, DPA mention.
- PRME context corrected: advertised 200K -> actual 180K (matches backend cap).
- Pricing table split:
epithre-embed-imagerow added at Rp25 / image.
2026-05-13
- Embed + rerank backend hardware upgrade. No customer-visible API change; throughput headroom improved.
- Iris (
epithre-iris) safety v2: layered CSAM + deepfake + LLM classifier filters.
2026-05-11
- Omni capacity expanded. Throughput headroom increased for the flagship chat tier.
- Initial Epithre Platform launch (P0-P6): all chat / embed / rerank / image endpoints live, auth + billing + dashboard + admin + email verification + signup credit.
2026-05-09
- PRME tier launched. Long-context premium chat (180K context window). Sister model to the flagship
epithre-omnifor long-document and codebase workloads.
Pre-launch
- Stack design and infrastructure work. Not customer-visible.
How to follow
- Subscribe to release announcements via email:
hello@epithre.comwith subject "Subscribe changelog". - Webhook events
platform.updateis on the roadmap.
Reporting a regression
If something used to work and now doesn't, email us with: what you were doing, when it started failing, request ID if you have one. We treat regressions as P0 during alpha.