`epithre-lyt`

Fast, cheap, multimodal. The "do it in volume" model.

Capabilities

Capability	Notes
Tier	Compact
Context window	32,768 tokens
Max output	4,096 tokens
Modalities	Text, image, audio, video
Tool use	Basic (less reliable for multi-step than omni/prme)
Extended thinking	No
Structured output	json_object + json_schema
Prompt caching	Yes
Streaming	Yes

When to use

High-volume classification, sentiment, tagging
Translation pipelines (batch)
Quick chat where flagship quality isn't needed
Audio transcription / understanding (if you have audio input)
Cost-sensitive features

When NOT to use

Complex agentic tool loops - reliability degrades beyond 2-3 steps.
Deep reasoning - use epithre-omni or epithre-prme.
High-precision Indonesian legal/medical - use epithre-omni.
Long-document analysis (>16K input) - use epithre-omni or prme.

Pricing

Input: Rp1,000 / 1M tokens
Output: Rp4,000 / 1M tokens
Cache: same multipliers
Batch: 0.5x

6x cheaper than epithre-omni on output, comparable on input. For high-volume classification, you can run millions of classifications for ~Rp20,000.

Performance characteristics

Near-instant latency: ~0.2-0.5s first-token, ~30-40 tokens/sec generation.
Solid Indonesian fluency at the casual register; sometimes weaker on formal/legal register vs omni.
Reliable for simple JSON / enum classification.

Caveats

Output cap is 4096 tokens (vs 16384 for omni/prme). Plan for short replies.
Audio + video input is supported but the model summarizes rather than transcribes. For high-fidelity transcription, look elsewhere.
Tool calling works for single calls but multi-step chains are unreliable - escalate to omni.

See also