Epithre AI Platform · API Documentation

Indonesian-tuned AI inference, OpenAI-compatible.

Chat completions, embeddings, reranking, and image generation — all served from Jakarta, all callable with the same SDK you already use for OpenAI. Six models across three tiers, one API contract.

Drop-in compatible with the OpenAI Python and JavaScript SDKs: just change the base URL. Tuned for Bahasa Indonesia. Self-hosted (your prompts never leave Indonesia).

6Models
200KMax context (PRME)
$0.04Embed / 1M tokens
$5Free credit on signup
5-minute quickstart

Make your first request

  1. Sign up at platform.epithre.com and verify your email (unlocks $5 free credit).
  2. In the dashboard, create an API key. Copy the esk_live_… token immediately — the full value is shown only once.
  3. Replace $EPITHRE_KEY below with your key.
curl https://api.epithre.com/v1/chat/completions \
  -H "Authorization: Bearer $EPITHRE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "epithre-omni",
    "messages": [
      {"role": "system", "content": "Kamu asisten yang menjawab singkat dalam bahasa Indonesia."},
      {"role": "user", "content": "Apa ibu kota Jepang dan apa mata uangnya?"}
    ]
  }'
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["EPITHRE_KEY"],
    base_url="https://api.epithre.com/v1",  # ← only line that changes vs OpenAI
)

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": "Kamu asisten yang menjawab singkat dalam bahasa Indonesia."},
        {"role": "user", "content": "Apa ibu kota Jepang dan apa mata uangnya?"},
    ],
)
print(resp.choices[0].message.content)
# → "Ibu kota Jepang adalah Tokyo. Mata uangnya yen Jepang (JPY)."
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.EPITHRE_KEY,
  baseURL: "https://api.epithre.com/v1",  // ← only line that changes vs OpenAI
});

const resp = await client.chat.completions.create({
  model: "epithre-omni",
  messages: [
    { role: "system", content: "Kamu asisten yang menjawab singkat dalam bahasa Indonesia." },
    { role: "user", content: "Apa ibu kota Jepang dan apa mata uangnya?" },
  ],
});
console.log(resp.choices[0].message.content);
package main

import (
    "context"
    "fmt"
    "os"

    "github.com/sashabaranov/go-openai"
)

func main() {
    config := openai.DefaultConfig(os.Getenv("EPITHRE_KEY"))
    config.BaseURL = "https://api.epithre.com/v1"
    client := openai.NewClientWithConfig(config)

    resp, err := client.CreateChatCompletion(context.Background(), openai.ChatCompletionRequest{
        Model: "epithre-omni",
        Messages: []openai.ChatCompletionMessage{
            {Role: "user", Content: "Apa ibu kota Jepang?"},
        },
    })
    if err != nil { panic(err) }
    fmt.Println(resp.Choices[0].Message.Content)
}
SDK compat: the OpenAI SDK works as-is. Only the base_url changes. All standard parameters (temperature, tools, stream, response_format, vision image_url) work the same way.
Coming from another provider

Migrating from OpenAI in 5 lines

Already using OpenAI? Switch to Epithre by changing two settings:

# Before
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
resp = client.chat.completions.create(model="gpt-4o", ...)

# After
client = OpenAI(
    api_key=os.environ["EPITHRE_KEY"],
    base_url="https://api.epithre.com/v1",
)
resp = client.chat.completions.create(model="epithre-omni", ...)

Model mapping

OpenAI modelEpithre equivalentNotes
gpt-4o / gpt-5epithre-omniMultimodal flagship, similar capability tier
o1-pro / long-contextepithre-prme200K context, reasoning + coding
gpt-4o-mini / gpt-3.5epithre-lytFast, cheap, multimodal incl. audio + video
text-embedding-3-largeepithre-embed4000-dim, Indonesian-tuned, MRL-truncatable
dall-e-3epithre-irisFLUX-based, supports multi-ref editing

What's the same

What's different

Required for every call

Authentication

Send your API key in the Authorization header on every request to /v1/*:

Authorization: Bearer esk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Key types

PrefixUse for
esk_live_*Production workloads. Real billing applies.
esk_test_*Sandbox / CI. Same endpoints; separate billing for clarity.

Best practices

Reference

Endpoints

Base URL: https://api.epithre.com

Models

GET/v1/models

List all available models with their tiers, contexts, and capabilities. Public — no auth required.

Response shape

{
  "object": "list",
  "data": [
    {
      "id": "epithre-omni",
      "object": "model",
      "owned_by": "epithre",
      "tier": "flagship",
      "context_window": 49152,
      "max_output_tokens": 16384,
      "modalities": ["text", "image"],
      "capabilities": ["chat", "tool_use", "vision", "thinking"],
      "description": "...",
      "pricing_model": "per_token"
    },
    ...
  ]
}

Available models

IDTierContextBest for
epithre-prmePremium200,000Long-context reasoning, full-codebase analysis, document review
epithre-omniFlagship49,152General chat with vision, agentic tool-use, extended thinking
epithre-lytCompact32,768High-throughput cheap chat, image + audio + video input
epithre-embedEmbedding4,096Semantic search, RAG retrieval (4000-dim)
epithre-rerankReranker2,048Boosting retrieval quality after embed search
epithre-irisImageText-to-image, multi-reference image editing

Chat Completions

POST/v1/chat/completions

The workhorse endpoint. Conversational text generation with optional streaming, tool calling, and vision. Compatible with OpenAI's chat-completions SDK.

When to use which model

epithre-omni

Default choice for chat. Handles text + images, supports tool calling, multilingual including strong Bahasa Indonesia.

epithre-prme

When prompts exceed 32K tokens. Full-document analysis, multi-file code review, long agentic chains.

epithre-lyt

High-volume cheap chat. Use when latency matters and you don't need flagship reasoning. Also: audio + video inputs.

Request body

FieldTypeRequiredDescription
modelstringyesepithre-prme, epithre-omni, or epithre-lyt
messagesarrayyesStandard chat messages. Roles: system, user, assistant, tool. Each message has content (string or vision array).
max_tokensintnoMax output tokens. Default model-dependent, max 16384 (omni/prme), 4096 (lyt)
temperaturefloatno0.0-2.0, default 1.0. Lower = more deterministic
top_pfloatno0.0-1.0 nucleus sampling, default 1.0
streamboolnoIf true, returns SSE chunks. Final chunk includes usage.
toolsarraynoFunction definitions. Standard OpenAI tools schema.
tool_choicestring or objectno"auto" (default), "none", or {"type":"function","function":{"name":"..."}}
response_formatobjectno{"type":"json_object"} for guaranteed JSON output
chat_template_kwargsobjectnoe.g. {"enable_thinking": true} for extended thinking mode on Omni/PRME
seedintnoFor reproducibility (best-effort)
stopstring or arraynoStop sequences

Response shape (non-streaming)

{
  "id": "chatcmpl-xxxxxx",
  "object": "chat.completion",
  "created": 1778455870,
  "model": "epithre-omni",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Ibu kota Jepang adalah Tokyo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 9,
    "total_tokens": 27
  }
}

Streaming response (SSE)

data: {"id":"chatcmpl-x","choices":[{"index":0,"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-x","choices":[{"index":0,"delta":{"content":"Ibu kota "}}]}

data: {"id":"chatcmpl-x","choices":[{"index":0,"delta":{"content":"Jepang adalah "}}]}

data: {"id":"chatcmpl-x","choices":[{"index":0,"delta":{"content":"Tokyo."}}]}

data: {"id":"chatcmpl-x","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: {"id":"chatcmpl-x","choices":[],"usage":{"prompt_tokens":18,"completion_tokens":9,"total_tokens":27}}

data: [DONE]

Examples

Streaming chat

curl https://api.epithre.com/v1/chat/completions \
  -H "Authorization: Bearer $EPITHRE_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "epithre-omni",
    "messages": [{"role": "user", "content": "Tulis 3 fakta tentang Jakarta"}],
    "stream": true,
    "max_tokens": 200
  }'
stream = client.chat.completions.create(
    model="epithre-omni",
    messages=[{"role": "user", "content": "Tulis 3 fakta tentang Jakarta"}],
    stream=True,
    max_tokens=200,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
  model: "epithre-omni",
  messages: [{ role: "user", content: "Tulis 3 fakta tentang Jakarta" }],
  stream: true,
  max_tokens: 200,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Vision (omni/lyt)

import base64
img_b64 = base64.b64encode(open("invoice.jpg", "rb").read()).decode()

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Ekstrak total pembayaran dari invoice ini sebagai angka."},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
        ]
    }],
)
print(resp.choices[0].message.content)
IMG=$(base64 -w0 invoice.jpg)
curl https://api.epithre.com/v1/chat/completions \
  -H "Authorization: Bearer $EPITHRE_KEY" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg img "$IMG" '{
    "model": "epithre-omni",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Ekstrak total pembayaran sebagai angka."},
        {"type": "image_url", "image_url": {"url": ("data:image/jpeg;base64," + $img)}}
      ]
    }]
  }')"

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city in Indonesia",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name in Indonesian"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
}]

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[{"role": "user", "content": "Cuaca Jakarta sekarang gimana?"}],
    tools=tools,
)

# Model decides to call the tool:
tool_call = resp.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # → "get_weather"
print(tool_call.function.arguments)  # → '{"city": "Jakarta", "unit": "celsius"}'

# Execute your function, return result:
followup = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "user", "content": "Cuaca Jakarta sekarang gimana?"},
        resp.choices[0].message,
        {"role": "tool", "tool_call_id": tool_call.id, "content": '{"temp": 31, "unit": "C", "condition": "cerah berawan"}'},
    ],
    tools=tools,
)
print(followup.choices[0].message.content)
# → "Jakarta sekarang 31°C, cerah berawan."

JSON mode

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": "Selalu balas dalam JSON."},
        {"role": "user", "content": "Ekstrak nama, umur, pekerjaan dari: 'Pak Budi, 45 tahun, dokter di RSCM'"},
    ],
    response_format={"type": "json_object"},
)
import json
data = json.loads(resp.choices[0].message.content)
# → {"nama": "Pak Budi", "umur": 45, "pekerjaan": "dokter"}

Common errors

HTTPCodeCauseFix
400model_not_foundBad model valueUse one of the 3 chat models
400invalid_request_errorEmpty messages arraySend at least one message
429backend_busyAggregate Epithre traffic at backend capRetry in ~1 second (exponential backoff)
429concurrency_exceededYour key's concurrent cap hitReduce parallelism or raise cap in dashboard

Embeddings

POST/v1/embeddings

Convert text into 4000-dim L2-normalized vectors for semantic search, RAG, clustering, and classification. Native model: Qwen3-Embed-8B (Indonesian-optimized, MRL-truncatable).

Semantic search

Embed your document corpus once, store in a vector DB (pgvector, Pinecone, Qdrant). Embed each user query and find nearest neighbors.

RAG retrieval

Pre-step before chat. Pair with rerank to push relevance over 95%. See RAG recipe.

Classification

Compute centroid embeddings per class, then assign new texts to nearest centroid. Strong baseline before training a model.

Request body

FieldTypeRequiredDescription
modelstringyesMust be epithre-embed
inputstring or arrayyesSingle text or 1-64 strings per request
dimensionsintno1-4000. If < 4000, MRL truncates & re-L2-normalizes server-side. Lossless prefix.
instructionstringnoQwen3 native task instruction prefix (e.g., "Retrieve relevant passages about Indonesian forestry law")

Response shape

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0107, -0.0022, 0.0152, ...],  // 4000 floats (or `dimensions` if set)
      "index": 0
    },
    {
      "object": "embedding",
      "embedding": [...],
      "index": 1
    }
  ],
  "model": "epithre-embed",
  "usage": {"prompt_tokens": 24, "total_tokens": 24}
}

Examples

Embed a batch with MRL truncation

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "penebangan liar di hutan lindung",
        "perlindungan satwa dilindungi UU 5/1990",
        "kebijakan ekspor batubara 2024",
    ],
    dimensions=1024,  # Truncate for smaller storage; still good quality
)
import numpy as np
vecs = np.array([e.embedding for e in resp.data])
print(vecs.shape)  # → (3, 1024)
print("cost:", resp.usage.prompt_tokens, "input tokens")
curl https://api.epithre.com/v1/embeddings \
  -H "Authorization: Bearer $EPITHRE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "epithre-embed",
    "input": ["penebangan liar di hutan lindung", "perlindungan satwa dilindungi UU 5/1990"],
    "dimensions": 1024
  }'

With instruction prefix (Qwen3 native)

resp = client.embeddings.create(
    model="epithre-embed",
    input=["UU 41/1999 tentang Kehutanan pasal 50"],
    instruction="Given a legal query, retrieve relevant Indonesian regulations",
)
Performance tip: Batch up to 64 inputs per request. One batch is ~10× faster + cheaper than 64 individual calls. Embeddings are deterministic — safe to cache by input hash.

Rerank

POST/v1/rerank

Cohere-compatible reranking endpoint. After vector search returns top-K candidates, rerank narrows to the most relevant ones using a heavier cross-encoder model. Boosts Indonesian retrieval quality from ~85% to ~98% in our benchmarks.

Request body

FieldTypeRequiredDescription
modelstringyesMust be epithre-rerank
querystringyesThe user search query
documentsarrayyes1-64 candidate documents (typically top-K from embed search)
top_nintnoReturn top N after sorting (default: all, sorted descending)
return_documentsboolnoInclude document text in response (default false = just indices+scores)
instructionstringnoCustom reranking instruction (overrides default)

Response shape

{
  "id": "2b475d38-c17c-43b9-ba9a-7f485af38e0f",
  "results": [
    {"index": 3, "relevance_score": 0.1480, "document": {"text": "..."}},
    {"index": 0, "relevance_score": 0.0567, "document": {"text": "..."}},
    {"index": 1, "relevance_score": 0.0000}
  ],
  "meta": {"billed_units": {"search_units": 1}}
}

Example

resp = httpx.post(
    "https://api.epithre.com/v1/rerank",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-rerank",
        "query": "perlindungan hutan lindung dari penebangan",
        "documents": [
            "UU 41/1999 pasal 50 — perusakan hutan",
            "Permen LHK satwa dilindungi",
            "Keppres pemilu 2024",
            "Pasal pengelolaan hutan lindung",
        ],
        "top_n": 3,
        "return_documents": True,
    },
).json()

for r in resp["results"]:
    print(f'{r["relevance_score"]:.3f}  {r["document"]["text"]}')

# → 0.148  Pasal pengelolaan hutan lindung
# → 0.057  UU 41/1999 pasal 50 — perusakan hutan
# → 0.000  Permen LHK satwa dilindungi   ← irrelevant, low score
Score semantics: Scores are P(yes) / (P(yes)+P(no)) from the model. Range [0, 1]. Indonesian queries often produce low absolute values (0.05-0.30 for true matches). Use rank order, not absolute threshold.

Image Generation

POST/v1/images/generations

Text-to-image using FLUX-Klein. Returns inline base64-encoded PNG. Optional style LoRAs for "dark" or "anime" aesthetics.

Request body

FieldTypeRequiredDescription
modelstringyesMust be epithre-iris
promptstringyesMax 2000 chars. English works best; Indonesian also supported.
sizestringno"WxH", max 960×960. Default 768×768. Width & height rounded down to nearest multiple of 16.
nintnoCurrently fixed at 1 (batch not yet supported)
response_formatstringno"b64_json" (default and only supported)
num_stepsintno1-50, default 4. More steps = slower but slightly higher quality.
seedintnoSeed for reproducibility. -1 = random.
guidance_scalefloatnoDefault 1.0. Higher = more literal prompt adherence.
lorastringno"none" (default), "dark" (moody/cinematic), "anime"
lora_strengthfloatno0.0-1.5, default 0.6 (for non-none LoRAs)

Response shape

{
  "created": 1778455000,
  "data": [
    {"b64_json": "iVBORw0KGgoAAAANSUhEUgAA..."}  // base64-encoded PNG
  ]
}

Examples

Basic generation

resp = client.images.generate(
    model="epithre-iris",
    prompt="a serene Indonesian beach at sunset, photorealistic, golden hour",
    size="768x768",
)

import base64
img_bytes = base64.b64decode(resp.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(img_bytes)

With anime LoRA

resp = httpx.post(
    "https://api.epithre.com/v1/images/generations",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-iris",
        "prompt": "a young samurai in a bamboo forest, cherry blossoms falling",
        "size": "768x768",
        "lora": "anime",
        "lora_strength": 0.8,
        "seed": 42,
    },
).json()
Latency: ~12-19 seconds for 768×768 @ 4 steps. Anime LoRA adds ~5s. For preview/iteration, use 4 steps; for final renders, 20-30 steps.

Image Edit

POST/v1/images/edits

Edit an existing image with a text prompt. Supports single source or up to 5 reference images for compositional editing (e.g., "put the product from img1 in the setting from img2").

Request body

FieldTypeRequiredDescription
modelstringyesepithre-iris
promptstringyesEdit instruction (e.g., "change background to sunset", "make the dress red")
imagestring (base64)conditionalSingle source image. PNG/JPEG/WebP/GIF, max 20 MB decoded. Mutually exclusive with images.
imagesarrayconditional1-5 reference images for compositional editing. Mutually exclusive with image.
sizestringnoMax 704×704 for edit. Default matches input.
strengthfloatno0.0-1.0, default 0.75. Higher = bigger change from source.
num_inference_stepsintno1-50, default 4
lorastringnoSame options as generation

Response shape

Same as /v1/images/generations: {"created", "data": [{"b64_json"}]}

Example: single-image edit

import base64
src = base64.b64encode(open("original.png", "rb").read()).decode()

resp = httpx.post(
    "https://api.epithre.com/v1/images/edits",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-iris",
        "prompt": "change the sky to dramatic stormy clouds with lightning",
        "image": src,
        "size": "512x512",
        "strength": 0.7,
    },
).json()

Example: multi-reference compositing

imgs = [base64.b64encode(open(f"ref_{i}.png", "rb").read()).decode()
        for i in range(3)]

resp = httpx.post(
    "https://api.epithre.com/v1/images/edits",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "model": "epithre-iris",
        "prompt": "the product from image 1, displayed in the studio setting from image 2, in the photography style of image 3",
        "images": imgs,
        "size": "640x640",
    },
).json()
Image format check: The API validates magic bytes — only PNG, JPEG, WebP, GIF accepted. Data URIs like data:image/png;base64,... are auto-stripped. URL inputs are rejected (SSRF guard).
End-to-end patterns

Recipes

RAG: embed + rerank + chat

The canonical pattern: index your knowledge base with embed, retrieve top-K with vector search, narrow with rerank, then synthesize an answer with chat.

import httpx, numpy as np
from openai import OpenAI

EK = os.environ["EPITHRE_KEY"]
client = OpenAI(api_key=EK, base_url="https://api.epithre.com/v1")

# 1) ONE-TIME: embed your corpus
corpus = [
    "UU 41/1999 pasal 50 — barangsiapa merusak hutan akan dipidana...",
    "Permen LHK No. 92/2018 tentang pengelolaan satwa liar...",
    "PP No. 23/2021 tentang penyelenggaraan kehutanan...",
    # ... thousands of docs
]
e = client.embeddings.create(model="epithre-embed", input=corpus, dimensions=1024)
corpus_vecs = np.array([row.embedding for row in e.data])  # (N, 1024)

# 2) AT QUERY TIME: embed the question, find top-K nearest
question = "Apa hukuman untuk perusakan hutan lindung?"
qe = client.embeddings.create(model="epithre-embed", input=question, dimensions=1024)
qv = np.array(qe.data[0].embedding)
scores = corpus_vecs @ qv  # cosine sim (already L2-normalized)
top_k_idx = np.argsort(-scores)[:10]
candidates = [corpus[i] for i in top_k_idx]

# 3) Rerank to push the most relevant to the top
r = httpx.post("https://api.epithre.com/v1/rerank",
    headers={"Authorization": f"Bearer {EK}"},
    json={"model": "epithre-rerank", "query": question, "documents": candidates, "top_n": 3, "return_documents": True},
).json()
context = "\n\n".join(item["document"]["text"] for item in r["results"])

# 4) Generate answer with retrieved context
resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": "Jawab berdasarkan konteks yang diberikan. Sebutkan pasal/peraturan yang relevan."},
        {"role": "user", "content": f"Konteks:\n{context}\n\nPertanyaan: {question}"},
    ],
)
print(resp.choices[0].message.content)

Vision QA on documents

Extract structured data from invoices, receipts, KTP, or any document image:

import base64, json
img = base64.b64encode(open("invoice.jpg", "rb").read()).decode()

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Ekstrak data invoice ini sebagai JSON dengan field: vendor, tanggal, nomor_invoice, item (list), subtotal, ppn, total."},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img}"}}
        ]
    }],
    response_format={"type": "json_object"},
)
data = json.loads(resp.choices[0].message.content)
print(data["total"])

Build an image generation app

def generate_thumbnail(topic: str) -> bytes:
    """Generate a marketing thumbnail for a given topic."""
    resp = httpx.post("https://api.epithre.com/v1/images/generations",
        headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
        json={
            "model": "epithre-iris",
            "prompt": f"professional marketing thumbnail for: {topic}, vibrant colors, no text",
            "size": "768x768",
            "num_steps": 8,  # higher quality
            "guidance_scale": 1.5,
        },
        timeout=60,
    ).json()
    return base64.b64decode(resp["data"][0]["b64_json"])

thumb = generate_thumbnail("Indonesian food blog — rendang recipe")
open("thumb.png", "wb").write(thumb)

Function calling for agents

Build an agent that can call functions. Loop until finish_reason == "stop" or no tool calls remain:

def get_weather(city: str) -> dict:
    # your real implementation
    return {"city": city, "temp_c": 31, "condition": "cerah berawan"}

def search_news(query: str) -> list:
    return [...]

tool_handlers = {"get_weather": get_weather, "search_news": search_news}
tool_defs = [...]  # OpenAI tools schema

messages = [{"role": "user", "content": "Cuaca Jakarta + berita terbaru tentang banjir"}]

for _ in range(5):  # max 5 rounds
    resp = client.chat.completions.create(model="epithre-omni", messages=messages, tools=tool_defs)
    msg = resp.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        print(msg.content)
        break

    for tc in msg.tool_calls:
        args = json.loads(tc.function.arguments)
        result = tool_handlers[tc.function.name](**args)
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": json.dumps(result),
        })
Operational reference

Error handling

All errors return JSON with this envelope (same as OpenAI):

{
  "error": {
    "message": "Human-readable description",
    "type": "error_category",
    "code": "specific_error_code"
  }
}

Full error catalog

HTTPTypeCommon codesWhat it meansRecovery
400invalid_request_errormodel_not_found, tos_required, email_not_verifiedBad parameters in request bodyFix client code, don't retry
401authentication_errorauthentication_errorMissing/invalid/revoked API keyCheck Authorization header; create new key if revoked
402insufficient_quotainsufficient_quotaAPI key has $0 balanceTop up via dashboard (email hello@epithre.com)
403permission_erroraccount_suspended, email_not_verified, permission_errorAccount suspended OR endpoint needs admin OR email not verifiedCheck email inbox or contact support
413request_too_largerequest_too_largeBody exceeds 1 MB (text) or 50 MB (images)Reduce payload size; truncate prompts; resize images
429rate_limit_errorrpm_exceeded, rpd_exceeded, concurrency_exceeded, backend_busyVarious rate limits hitExponential backoff (1s, 2s, 4s, …). Increase limits in dashboard if persistent.
502backend_errorbackend_errorInference backend returned 5xxRetry once with backoff
503backend_unavailablebackend_unavailableBackend not reachable (planned maintenance or outage)Retry with longer backoff; check dashboard health
504backend_timeoutbackend_timeoutBackend slow / hungRetry with shorter prompt or simpler request

Recommended retry pattern

import time

def call_with_retry(call_fn, max_retries=4):
    for attempt in range(max_retries):
        try:
            return call_fn()
        except openai.APIError as e:
            if e.status_code in (429, 502, 503, 504):
                if attempt == max_retries - 1: raise
                wait = (2 ** attempt) + random.uniform(0, 1)  # jitter
                time.sleep(wait)
                continue
            raise  # don't retry 400/401/402/403

Rate limits

Two layers of limits apply to every request:

Per-key limits (yours to control)

LimitDefaultWhere to change
Requests per minute (RPM)60Dashboard → Keys → edit
Requests per day (RPD)10,000Dashboard → Keys → edit
Concurrent requests10Dashboard → Keys → edit
Monthly spend cap$100Dashboard → Keys → edit

Exceed any → HTTP 429 with code indicating which limit.

Backend capacity (shared)

Each chat model has an aggregate concurrent cap across all Epithre customers, to reserve capacity for other services. When hit, you get HTTP 429 with code: "backend_busy". Wait ~1 second and retry — these clear quickly.

Pricing

All prices in USD. Billed per request, deducted from your active API key's credit balance immediately.

ModelInputOutputUnit price
epithre-prme$0.40 / 1M tok$1.60 / 1M toktoken
epithre-omni$0.30 / 1M tok$1.20 / 1M toktoken
epithre-lyt$0.05 / 1M tok$0.20 / 1M toktoken
epithre-embed$0.04 / 1M tokinput token only
epithre-rerank$0.000002 / document
epithre-iris$0.005 / image

Topping up

During alpha, top-up is handled by email: send to hello@epithre.com with the amount and we'll reply with payment details (BCA/Wise/etc). Credit usually applied within 1 business day after payment clears. Stripe/Midtrans integration is planned for general availability.

SDKs & clients

Epithre is OpenAI-wire-compatible. Use any OpenAI SDK and override base_url:

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="epithre-omni",
    api_key=os.environ["EPITHRE_KEY"],
    base_url="https://api.epithre.com/v1",
)
llm.invoke("Halo!")

LlamaIndex

from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai_like import OpenAILikeEmbedding

Settings.llm = OpenAILike(
    model="epithre-omni",
    api_key=os.environ["EPITHRE_KEY"],
    api_base="https://api.epithre.com/v1",
)
Settings.embed_model = OpenAILikeEmbedding(
    model_name="epithre-embed",
    api_key=os.environ["EPITHRE_KEY"],
    api_base="https://api.epithre.com/v1",
)

For endpoints not in OpenAI SDK (rerank, image edit)

Use plain HTTP via your language's standard client (Python httpx, JS fetch, Go net/http). See examples above.

Data policy (summary)

Full policy: privacy.html.

FAQ

Can I use Epithre to power a customer-facing product?

Yes. The standard ToS allows commercial use. Don't resell raw API access as your own product without a separate reseller agreement (contact us). See Terms.

How does Indonesian quality compare to OpenAI?

Our models are specifically fine-tuned on Indonesian data and outperform comparable OpenAI tier models on Indonesian benchmarks (semantic similarity, instruction following, cultural context). Run your own A/B on real production traffic — that's the only honest benchmark.

What's your uptime SLA?

Best-effort during alpha (no contractual SLA). We monitor 24/7 via Uptime Kuma. In practice we target 99.5%+. For mission-critical workloads with hard SLA, contact us for enterprise terms.

Can I run Epithre on-prem?

Not yet — the platform is currently cloud-only. On-prem licensing is on the roadmap for enterprise.

How do I report abuse or a security issue?

Email hello@epithre.com. For security issues, use subject "Security disclosure". We'll respond within 1 business day.

What happens if I hit my monthly spend cap?

Your key returns 402 insufficient_quota for the rest of the month. Raise the cap in the dashboard or top up to continue.

Can I get my data exported / account deleted?

Yes. Email hello@epithre.com. Export typically delivered as CSV within 7 days. Account deletion erases all linked data within 30 days, except billing records (retained 7 years per Indonesian tax law).