Embeddings

POST /v1/embeddings

Convert text and/or images into 4000-dim L2-normalized vectors. Text and image vectors live in the same semantic space, so cosine similarity between any pair is meaningful (cross-modal RAG).

Indonesian-optimized multimodal embedding, MRL-truncatable to any dimension between 1 and 4000.

Request body

Field	Type	Required	Description
`model`	string	yes	Must be `epithre-embed`.
`input`	string OR array	yes	Single string, or 1-64 strings, or mixed array of strings + image objects. See "Input shapes" below.
`dimensions`	int	no	1-4000. Truncates output to N dims and re-L2-normalizes (lossless prefix via Matryoshka). Default 4000.
`instruction`	string	no	Retrieval task instruction prefix. Text-only; not allowed with image input.
`truncate`	string	no	`"END"` (default, keep first 10K chars), `"START"` (keep last 10K), `"NONE"` (422 error if too long).

Input shapes

Each item in the input array is either a string or an image object:

// Single text
"input": "kucing oren tidur di kasur"

// Multiple texts
"input": ["text 1", "text 2", "text 3"]

// Image item: raw base64
{"type": "image", "image": "iVBORw0KGgo..."}

// Image item: data URI
{"type": "image", "image_url": "data:image/jpeg;base64,..."}

// Image item: chat-content style with image_url object (also accepted)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

// Mixed:
"input": [
    "text query",
    {"type": "image", "image": "iVBORw0KGgo..."},
    "another text"
]

Accepted image formats: PNG, JPEG, WebP, GIF. Max 20 MB per image (post-base64-decode).

Each text item is capped at 10,000 characters server-side (calibrated below the underlying model's 4096-token context with safety margin for dense legal/code text at ~2.8 chars/token). Longer inputs are auto-truncated head-first; you'll see usage.truncated_count in the response.

Opt out via truncate:

{
  "model": "epithre-embed",
  "input": ["very long text..."],
  "truncate": "END"     // default - keep first 10K chars of each item
  // "truncate": "START" // keep last 10K chars instead
  // "truncate": "NONE"  // strict - return 422 if any item is too long
}

Limits

Max 64 text items per request
Max 8 image items per request
Max 25 MB request body (multipart images)

Performance: batch texts in one request

When you have multiple texts to embed at the same point in your pipeline (chunked documents, fact extraction over a message, pre-computing a candidate set), send them as a single request with input as an array — not as N sequential requests of a single string each.

# Slower: N HTTP roundtrips, N scheduler ticks
for text in texts:
    resp = client.embeddings.create(model="epithre-embed", input=[text])

# Faster: 1 HTTP roundtrip, 1 batched server call
resp = client.embeddings.create(model="epithre-embed", input=texts)

Measured wall-clock for a 10-text burst on the realtime endpoint:

Pattern	Mean	p95
Sequential, 10 separate requests	1.65 s	2.10 s
Batched, 1 request with 10 texts	0.50 s	0.63 s
Speedup	3.3×	3.3×

The win is from a single HTTP roundtrip, a single backend scheduler tick, and more efficient prefill batching server-side. There is no artificial wait — items must already be in memory together when you call.

If your items arrive over time and waiting would add unwanted latency, prefer parallel calls via asyncio.gather. If your workload is offline-acceptable (24h SLA), use the Batch API for a 50% discount on top.

Response shape

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0107, -0.0022, 0.0152, ...],
      "index": 0,
      "type": "text"
    },
    {
      "object": "embedding",
      "embedding": [...],
      "index": 1,
      "type": "image"
    }
  ],
  "model": "epithre-embed",
  "usage": {
    "prompt_tokens": 24,
    "total_tokens": 24,
    "image_count": 1,
    "truncated_count": 0
  }
}

Vectors are L2-normalized (Euclidean norm = 1). For cosine similarity between two vectors, dot product is sufficient (no need to re-normalize).

Examples

Text embedding with MRL truncation

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "penebangan liar di hutan lindung",
        "perlindungan satwa dilindungi UU 5/1990",
    ],
    dimensions=1024,    # truncate from 4000 for smaller storage
)
import numpy as np
vecs = np.array([d.embedding for d in resp.data])
print(vecs.shape)  # (2, 1024)

Image embedding

import base64, httpx
img_b64 = base64.b64encode(open("product.jpg", "rb").read()).decode()

r = httpx.post(
    "https://api.epithre.com/v1/embeddings",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-embed",
        "input": [{"type": "image", "image": img_b64}],
    },
).json()

img_vec = r["data"][0]["embedding"]   # 4000-dim, L2-normalized

Mixed text + images

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "kucing oren tidur di kasur",
        {"type": "image", "image_url": f"data:image/jpeg;base64,{cat_photo_b64}"},
        "anjing golden retriever",
        {"type": "image", "image_url": f"data:image/jpeg;base64,{dog_photo_b64}"},
    ],
)
# resp.data preserves input order; each item has .type field
for item in resp.data:
    print(item.index, item.type, item.embedding[:3])

With instruction prefix (text only)

resp = client.embeddings.create(
    model="epithre-embed",
    input=["UU 41/1999 tentang Kehutanan pasal 50"],
    extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)

The instruction prefix is a native feature of the embedding model that can improve retrieval accuracy for specific tasks. Common instructions:

"Represent this document for retrieval:" - default retrieval embed.
"Represent this query for retrieving relevant documents:" - query-side embed.
"Represent this text for classification:" - classification clustering.

Errors

HTTP	Cause
400	Invalid `model`, empty `input`, more than 64 text items, more than 8 images.
413	Body exceeds 25 MB.
422	`truncate: NONE` and a text exceeds 10K chars, or `instruction` combined with image input.
429	Rate limit or backend busy.

Pricing

Text: Rp1,500 per 1M input tokens
Image: Rp25 per image
Storage is free.

See pricing.