The flagship. Default choice for most chat workloads.
| Capability |
Notes |
| Tier |
Flagship |
| Context window |
49,152 tokens |
| Max output |
16,384 tokens |
| Modalities |
Text, image (vision) |
| Tool use |
Yes; standard tools + tool_choice schema |
| Extended thinking |
Yes; opt-in via chat_template_kwargs={"enable_thinking": true} |
| Structured output |
Yes; response_format json_object + json_schema strict |
| Prompt caching |
Yes; cache_control markers |
| Streaming |
Yes; SSE |
| Indonesian fluency |
Native (tuned on Indonesian corpora) |
- General chat with vision
- Agentic tool-use chains
- Anything that needs strong Indonesian register handling
- Vision QA on documents, screenshots, photos
- RAG synthesis after retrieval
- Very long context (>32K tokens) - use
epithre-prme (180K) instead.
- High-throughput cheap classification - use
epithre-lyt (~6x cheaper).
- Audio or video input - use
epithre-lyt.
- Input: Rp7,000 / 1M tokens
- Output: Rp25,000 / 1M tokens
- Prompt cache write: 1.25x input rate (Rp8,750 / 1M)
- Prompt cache read: 0.1x input rate (Rp700 / 1M)
- Batch: 0.5x all rates
| Task |
Score (vs.) |
| Indonesian general chat |
Strong (native fluency) |
| Indonesian legal QA (with RAG) |
~95% accuracy on 200-question gold set |
| Indonesian sentiment classification |
~92% F1 (3-class) |
| Indonesian vision QA (invoice extraction) |
~94% field-level F1 |
| English chat |
Strong on everyday and technical English |
- Non-streaming, 500-token reply: ~3-8s
- Streaming first-token: ~0.4-1.2s
- With thinking enabled: +2-10s depending on problem complexity
tool_choice="required" is reliable as of May 2026 — prior stall on long prompts (>5K tokens) is resolved. Bare "required", named tool choice, and "auto" all work cleanly on Omni.
- Structured output (json_schema) benefits from the same fix. Long-input schema-constrained generation is stable; fall back to
json_object only if you hit a specific edge case.