`epithre-omni`

The flagship. Default choice for most chat workloads.

Capabilities

Capability	Notes
Tier	Flagship
Context window	49,152 tokens
Max output	16,384 tokens
Modalities	Text, image (vision)
Tool use	Yes; standard `tools` + `tool_choice` schema
Extended thinking	Yes; opt-in via `chat_template_kwargs={"enable_thinking": true}`
Structured output	Yes; `response_format` json_object + json_schema strict
Prompt caching	Yes; `cache_control` markers
Streaming	Yes; SSE
Indonesian fluency	Native (tuned on Indonesian corpora)

When to use

General chat with vision
Agentic tool-use chains
Anything that needs strong Indonesian register handling
Vision QA on documents, screenshots, photos
RAG synthesis after retrieval

When NOT to use

Very long context (>32K tokens) - use epithre-prme (180K) instead.
High-throughput cheap classification - use epithre-lyt (~6x cheaper).
Audio or video input - use epithre-lyt.

Pricing

Input: Rp7,000 / 1M tokens
Output: Rp25,000 / 1M tokens
Prompt cache write: 1.25x input rate (Rp8,750 / 1M)
Prompt cache read: 0.1x input rate (Rp700 / 1M)
Batch: 0.5x all rates

Performance benchmarks

Task	Score (vs.)
Indonesian general chat	Strong (native fluency)
Indonesian legal QA (with RAG)	~95% accuracy on 200-question gold set
Indonesian sentiment classification	~92% F1 (3-class)
Indonesian vision QA (invoice extraction)	~94% field-level F1
English chat	Strong on everyday and technical English

Latency

Non-streaming, 500-token reply: ~3-8s
Streaming first-token: ~0.4-1.2s
With thinking enabled: +2-10s depending on problem complexity

Caveats

tool_choice="required" is reliable as of May 2026 — prior stall on long prompts (>5K tokens) is resolved. Bare "required", named tool choice, and "auto" all work cleanly on Omni.
Structured output (json_schema) benefits from the same fix. Long-input schema-constrained generation is stable; fall back to json_object only if you hit a specific edge case.

See also