Changelog

2026-05-25

epithre-omni reliability fix for high-concurrency agentic workloads. Fixed a backend-pool accounting bug where an abruptly-disconnected streaming request could fail to release its slot — over time this leaked slots and caused spurious backend_busy rejections even when the backend was free. Separately, the upstream timeout for the omni tier was raised so long-running agentic chains (large accumulated context legitimately exceeds the old limit under concurrency) are no longer cut off mid-stream. Net effect: heavy concurrent bursts now degrade gracefully — excess requests receive a clean, retryable 429 backend_busy instead of dropped connections. Client guidance for agentic / large-context workloads: set your HTTP read timeout to ≥240s, keep client concurrency conservative (large-context calls are throughput-intensive — more in-flight mostly adds queueing, not throughput), and retry on 429 backend_busy with a short backoff. See Rate limits.

2026-05-22

epithre-omni backend capacity expanded further. Aggregate concurrent cap raised again following sustained stress validation — ~50% more concurrent headroom on top of the May 17 increase. Bursty workloads previously bottlenecked at peak are now absorbed transparently. Combined with the May 20 queue grace, short bursts very rarely touch the cap. No client change needed.

2026-05-21 — Currency migration: USD → IDR

All pricing now in IDR (Indonesian Rupiah). Existing balances converted at 17,600 IDR/USD on this date.
API field rename: cost_usd, amount_usd, monthly_usd_cap, credit_balance_usd, input_per_mtok_usd, output_per_mtok_usd, per_unit_usd → corresponding *_idr fields. Webhook batch payload field total_cost_usd → total_cost_idr.
Tier B premium pricing applied (audit-validated against May 2026 market — Anthropic/OpenAI/Together AI/Replicate). See Pricing.
Signup credit: Rp50,000 (was $5).
Default monthly cap per key: Rp1,000,000 (was $100).

2026-05-20

epithre-omni backend queue. When the shared Omni pool is saturated, requests now wait up to 45 seconds for a slot to free before returning HTTP 429 backend_busy — instead of rejecting immediately. Short bursts that previously triggered 429 spam are absorbed as a small first-byte latency increase. PRME and LYT behavior unchanged (still immediate 429). No client change needed; the response shape and code on actual rejection are identical. Make sure your HTTP client read timeout is at least 90s. See Rate limits and Troubleshooting.

2026-05-19

tool_choice="required" reliability fix on epithre-omni. Prior long-prompt stall (>5K tokens combined with strict tool-call enforcement) is resolved. Bare "required", named tool choice, and "auto" all work reliably across the full prompt-length range. response_format: json_schema strict mode benefits from the same fix. No client change needed.

2026-05-17

epithre-omni backend capacity expanded. Aggregate concurrent cap raised significantly on the flagship tier. 429 backend_busy rates drop for bursty workloads. No client change needed.
/v1/rerank defensive auto-truncate. Documents longer than 6000 characters are now server-side clamped to fit the reranker's input window, instead of returning 422. Mirrors the auto-truncate behavior already on /v1/embeddings. Transparent — no response shape change.

2026-05-15

Credit balance moved to account level. Single pool per account, drained by every key. Topping up no longer requires picking a key. Matches Anthropic / OpenAI / DeepSeek pattern. Existing balances were merged: sum of your active keys' previous balances is now your account balance. No action needed.
monthly_idr_cap per key now enforced (was cosmetic). When a key hits its monthly IDR cap, it returns HTTP 402 monthly_cap_exceeded. Other keys keep working. Default cap Rp1,000,000. Raise to 0 for no per-key cap.
Admin dashboard: per-key Edit button (rpm / rpd / concurrency / monthly cap); credit Top-up moved to user-level button.
/admin/system-health now reports embed and rerank as separate rows.

2026-05-14

Multi-page docs (this site). Split from single-page to per-topic structure.
/v1/retrieval endpoint. Upload knowledge files (PDF/TXT/MD), search via cosine over chunked corpus. Turnkey RAG.
Files API extended with purpose=knowledge for retrieval ingest.
Knowledge processor async worker: extracts text, chunks recursively, embeds, indexes.
Embed model upgraded to a multimodal backbone. Text and image vectors now share a 4000-dim space. Cross-modal retrieval enabled.
Image embed exposed via /v1/embeddings with {"type": "image", ...} input items. Mixed batch supported.
Auto-truncate on /v1/embeddings: text >10K chars auto-clamped to head. Opt out via truncate: "NONE".
Prompt caching with explicit cache_control markers. 1.25x write, 0.1x read.
Batch API at 50% off.
Webhooks for batch terminal events. HMAC-signed, exponential backoff retry.
Structured output via response_format json_schema strict mode.
Light-theme docs + dashboard redesign.
Legal pages v1.1: full retention table, UU 27/2022 PDP reference, DPA mention.
PRME context corrected: advertised 200K -> actual 180K (matches backend cap).
Pricing table split: epithre-embed-image row added at Rp25 / image.

2026-05-13

Embed + rerank backend hardware upgrade. No customer-visible API change; throughput headroom improved.
Iris (epithre-iris) safety v2: layered CSAM + deepfake + LLM classifier filters.

2026-05-11

Omni capacity expanded. Throughput headroom increased for the flagship chat tier.
Initial Epithre Platform launch (P0-P6): all chat / embed / rerank / image endpoints live, auth + billing + dashboard + admin + email verification + signup credit.

2026-05-09

PRME tier launched. Long-context premium chat (180K context window). Sister model to the flagship epithre-omni for long-document and codebase workloads.

Pre-launch

Stack design and infrastructure work. Not customer-visible.

How to follow

Subscribe to release announcements via email: hello@epithre.com with subject "Subscribe changelog".
Webhook events platform.update is on the roadmap.

Reporting a regression

If something used to work and now doesn't, email us with: what you were doing, when it started failing, request ID if you have one. We treat regressions as P0 during alpha.