epithre-embed

Multimodal embedding model. Indonesian-optimized, with text and image vectors in a shared 4000-dim space.

Capabilities

Capability Notes
Tier Embedding
Max input tokens 4,096 (per text item; ~10K chars effective)
Output dim 4,000 (native), Matryoshka-truncatable to 1-4000
Modalities Text, image
Cross-modal YES - text and image vectors in same space
Instruction-aware Yes; instruction field for task-specific prompting
Auto-truncate Yes; 10K-char cap per text with END / START / NONE modes

When to use

Key feature: cross-modal in one space

Text and image embeddings live in the same 4000-dim space. Cosine similarity between any text vector and any image vector is meaningful, so you can:

Output details

Pricing

Performance

Limits

Caveats

See also