Embeddings

POST /v1/embeddings

Convert text and/or images into 4000-dim L2-normalized vectors. Text and image vectors live in the same semantic space, so cosine similarity between any pair is meaningful (cross-modal RAG).

Indonesian-optimized multimodal embedding, MRL-truncatable to any dimension between 1 and 4000.

Request body

Field Type Required Description
model string yes Must be epithre-embed.
input string OR array yes Single string, or 1-64 strings, or mixed array of strings + image objects. See "Input shapes" below.
dimensions int no 1-4000. Truncates output to N dims and re-L2-normalizes (lossless prefix via Matryoshka). Default 4000.
instruction string no Retrieval task instruction prefix. Text-only; not allowed with image input.
truncate string no "END" (default, keep first 10K chars), "START" (keep last 10K), "NONE" (422 error if too long).

Input shapes

Each item in the input array is either a string or an image object:

// Single text
"input": "kucing oren tidur di kasur"

// Multiple texts
"input": ["text 1", "text 2", "text 3"]

// Image item: raw base64
{"type": "image", "image": "iVBORw0KGgo..."}

// Image item: data URI
{"type": "image", "image_url": "data:image/jpeg;base64,..."}

// Image item: chat-content style with image_url object (also accepted)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

// Mixed:
"input": [
    "text query",
    {"type": "image", "image": "iVBORw0KGgo..."},
    "another text"
]

Accepted image formats: PNG, JPEG, WebP, GIF. Max 20 MB per image (post-base64-decode).

Auto-truncate behavior

Each text item is capped at 10,000 characters server-side (calibrated below the underlying model's 4096-token context with safety margin for dense legal/code text at ~2.8 chars/token). Longer inputs are auto-truncated head-first; you'll see usage.truncated_count in the response.

Opt out via truncate:

{
  "model": "epithre-embed",
  "input": ["very long text..."],
  "truncate": "END"     // default - keep first 10K chars of each item
  // "truncate": "START" // keep last 10K chars instead
  // "truncate": "NONE"  // strict - return 422 if any item is too long
}

Limits

Performance: batch texts in one request

When you have multiple texts to embed at the same point in your pipeline (chunked documents, fact extraction over a message, pre-computing a candidate set), send them as a single request with input as an array — not as N sequential requests of a single string each.

# Slower: N HTTP roundtrips, N scheduler ticks
for text in texts:
    resp = client.embeddings.create(model="epithre-embed", input=[text])

# Faster: 1 HTTP roundtrip, 1 batched server call
resp = client.embeddings.create(model="epithre-embed", input=texts)

Measured wall-clock for a 10-text burst on the realtime endpoint:

Pattern Mean p95
Sequential, 10 separate requests 1.65 s 2.10 s
Batched, 1 request with 10 texts 0.50 s 0.63 s
Speedup 3.3× 3.3×

The win is from a single HTTP roundtrip, a single backend scheduler tick, and more efficient prefill batching server-side. There is no artificial wait — items must already be in memory together when you call.

If your items arrive over time and waiting would add unwanted latency, prefer parallel calls via asyncio.gather. If your workload is offline-acceptable (24h SLA), use the Batch API for a 50% discount on top.

Response shape

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0107, -0.0022, 0.0152, ...],
      "index": 0,
      "type": "text"
    },
    {
      "object": "embedding",
      "embedding": [...],
      "index": 1,
      "type": "image"
    }
  ],
  "model": "epithre-embed",
  "usage": {
    "prompt_tokens": 24,
    "total_tokens": 24,
    "image_count": 1,
    "truncated_count": 0
  }
}

Vectors are L2-normalized (Euclidean norm = 1). For cosine similarity between two vectors, dot product is sufficient (no need to re-normalize).

Examples

Text embedding with MRL truncation

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "penebangan liar di hutan lindung",
        "perlindungan satwa dilindungi UU 5/1990",
    ],
    dimensions=1024,    # truncate from 4000 for smaller storage
)
import numpy as np
vecs = np.array([d.embedding for d in resp.data])
print(vecs.shape)  # (2, 1024)

Image embedding

import base64, httpx
img_b64 = base64.b64encode(open("product.jpg", "rb").read()).decode()

r = httpx.post(
    "https://api.epithre.com/v1/embeddings",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-embed",
        "input": [{"type": "image", "image": img_b64}],
    },
).json()

img_vec = r["data"][0]["embedding"]   # 4000-dim, L2-normalized

Mixed text + images

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "kucing oren tidur di kasur",
        {"type": "image", "image_url": f"data:image/jpeg;base64,{cat_photo_b64}"},
        "anjing golden retriever",
        {"type": "image", "image_url": f"data:image/jpeg;base64,{dog_photo_b64}"},
    ],
)
# resp.data preserves input order; each item has .type field
for item in resp.data:
    print(item.index, item.type, item.embedding[:3])

With instruction prefix (text only)

resp = client.embeddings.create(
    model="epithre-embed",
    input=["UU 41/1999 tentang Kehutanan pasal 50"],
    extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)

The instruction prefix is a native feature of the embedding model that can improve retrieval accuracy for specific tasks. Common instructions:

Errors

HTTP Cause
400 Invalid model, empty input, more than 64 text items, more than 8 images.
413 Body exceeds 25 MB.
422 truncate: NONE and a text exceeds 10K chars, or instruction combined with image input.
429 Rate limit or backend busy.

Pricing

See pricing.

See also