Embeddings
POST /v1/embeddings
Convert text and/or images into 4000-dim L2-normalized vectors. Text and image vectors live in the same semantic space, so cosine similarity between any pair is meaningful (cross-modal RAG).
Indonesian-optimized multimodal embedding, MRL-truncatable to any dimension between 1 and 4000.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | yes | Must be epithre-embed. |
input |
string OR array | yes | Single string, or 1-64 strings, or mixed array of strings + image objects. See "Input shapes" below. |
dimensions |
int | no | 1-4000. Truncates output to N dims and re-L2-normalizes (lossless prefix via Matryoshka). Default 4000. |
instruction |
string | no | Retrieval task instruction prefix. Text-only; not allowed with image input. |
truncate |
string | no | "END" (default, keep first 10K chars), "START" (keep last 10K), "NONE" (422 error if too long). |
Input shapes
Each item in the input array is either a string or an image object:
// Single text
"input": "kucing oren tidur di kasur"
// Multiple texts
"input": ["text 1", "text 2", "text 3"]
// Image item: raw base64
{"type": "image", "image": "iVBORw0KGgo..."}
// Image item: data URI
{"type": "image", "image_url": "data:image/jpeg;base64,..."}
// Image item: chat-content style with image_url object (also accepted)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
// Mixed:
"input": [
"text query",
{"type": "image", "image": "iVBORw0KGgo..."},
"another text"
]
Accepted image formats: PNG, JPEG, WebP, GIF. Max 20 MB per image (post-base64-decode).
Auto-truncate behavior
Each text item is capped at 10,000 characters server-side (calibrated below the underlying model's 4096-token context with safety margin for dense legal/code text at ~2.8 chars/token). Longer inputs are auto-truncated head-first; you'll see usage.truncated_count in the response.
Opt out via truncate:
{
"model": "epithre-embed",
"input": ["very long text..."],
"truncate": "END" // default - keep first 10K chars of each item
// "truncate": "START" // keep last 10K chars instead
// "truncate": "NONE" // strict - return 422 if any item is too long
}
Limits
- Max 64 text items per request
- Max 8 image items per request
- Max 25 MB request body (multipart images)
Performance: batch texts in one request
When you have multiple texts to embed at the same point in your pipeline (chunked documents, fact extraction over a message, pre-computing a candidate set), send them as a single request with input as an array — not as N sequential requests of a single string each.
# Slower: N HTTP roundtrips, N scheduler ticks
for text in texts:
resp = client.embeddings.create(model="epithre-embed", input=[text])
# Faster: 1 HTTP roundtrip, 1 batched server call
resp = client.embeddings.create(model="epithre-embed", input=texts)
Measured wall-clock for a 10-text burst on the realtime endpoint:
| Pattern | Mean | p95 |
|---|---|---|
| Sequential, 10 separate requests | 1.65 s | 2.10 s |
| Batched, 1 request with 10 texts | 0.50 s | 0.63 s |
| Speedup | 3.3× | 3.3× |
The win is from a single HTTP roundtrip, a single backend scheduler tick, and more efficient prefill batching server-side. There is no artificial wait — items must already be in memory together when you call.
If your items arrive over time and waiting would add unwanted latency, prefer parallel calls via asyncio.gather. If your workload is offline-acceptable (24h SLA), use the Batch API for a 50% discount on top.
Response shape
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0107, -0.0022, 0.0152, ...],
"index": 0,
"type": "text"
},
{
"object": "embedding",
"embedding": [...],
"index": 1,
"type": "image"
}
],
"model": "epithre-embed",
"usage": {
"prompt_tokens": 24,
"total_tokens": 24,
"image_count": 1,
"truncated_count": 0
}
}
Vectors are L2-normalized (Euclidean norm = 1). For cosine similarity between two vectors, dot product is sufficient (no need to re-normalize).
Examples
Text embedding with MRL truncation
resp = client.embeddings.create(
model="epithre-embed",
input=[
"penebangan liar di hutan lindung",
"perlindungan satwa dilindungi UU 5/1990",
],
dimensions=1024, # truncate from 4000 for smaller storage
)
import numpy as np
vecs = np.array([d.embedding for d in resp.data])
print(vecs.shape) # (2, 1024)
Image embedding
import base64, httpx
img_b64 = base64.b64encode(open("product.jpg", "rb").read()).decode()
r = httpx.post(
"https://api.epithre.com/v1/embeddings",
headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
json={
"model": "epithre-embed",
"input": [{"type": "image", "image": img_b64}],
},
).json()
img_vec = r["data"][0]["embedding"] # 4000-dim, L2-normalized
Mixed text + images
resp = client.embeddings.create(
model="epithre-embed",
input=[
"kucing oren tidur di kasur",
{"type": "image", "image_url": f"data:image/jpeg;base64,{cat_photo_b64}"},
"anjing golden retriever",
{"type": "image", "image_url": f"data:image/jpeg;base64,{dog_photo_b64}"},
],
)
# resp.data preserves input order; each item has .type field
for item in resp.data:
print(item.index, item.type, item.embedding[:3])
With instruction prefix (text only)
resp = client.embeddings.create(
model="epithre-embed",
input=["UU 41/1999 tentang Kehutanan pasal 50"],
extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)
The instruction prefix is a native feature of the embedding model that can improve retrieval accuracy for specific tasks. Common instructions:
"Represent this document for retrieval:"- default retrieval embed."Represent this query for retrieving relevant documents:"- query-side embed."Represent this text for classification:"- classification clustering.
Errors
| HTTP | Cause |
|---|---|
| 400 | Invalid model, empty input, more than 64 text items, more than 8 images. |
| 413 | Body exceeds 25 MB. |
| 422 | truncate: NONE and a text exceeds 10K chars, or instruction combined with image input. |
| 429 | Rate limit or backend busy. |
Pricing
- Text: Rp1,500 per 1M input tokens
- Image: Rp25 per image
- Storage is free.
See pricing.
See also
- Multimodal guide - cross-modal patterns.
- RAG cookbook - embed + rerank + chat pipeline.
- Cross-modal RAG cookbook - text<->image retrieval.