Migrate from OpenAI

Currency note: Epithre prices in IDR (Rupiah). Competitor prices left in USD for comparison reference.

Most OpenAI customers can migrate in under 5 minutes. The wire format is identical; only the base URL, API key, and model IDs change.

TL;DR

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
resp = client.chat.completions.create(model="gpt-4o", messages=[...])

# After
from openai import OpenAI
client = OpenAI(
    api_key="esk_live_...",
    base_url="https://api.epithre.com/v1",
)
resp = client.chat.completions.create(model="epithre-omni", messages=[...])

Three lines: api_key, base_url, model ID.

Model mapping

OpenAI model	Epithre equivalent	When to use which
`gpt-4o`, `gpt-4.1`	`epithre-omni`	General multimodal chat. Same role: flagship default.
`o1`, `o1-pro`, `o3-mini`	`epithre-prme`	Long context (200K), reasoning. Use `chat_template_kwargs={"enable_thinking": True}` for explicit reasoning chains.
`gpt-4o-mini`, `gpt-3.5-turbo`	`epithre-lyt`	Fast, cheap. Same role: high-throughput cost-sensitive workloads.
`text-embedding-3-large`	`epithre-embed`	4000-dim, MRL-truncatable. Plus image embed in the same vector space (cross-modal).
`text-embedding-3-small`	`epithre-embed` + `dimensions=1024`	Truncate to 1024-dim for smaller storage.
`dall-e-3`	`epithre-iris`	Text-to-image. Plus multi-reference editing.
`gpt-4o-realtime`	(not available)	Real-time audio not yet offered.
`whisper-1`	(not available)	Audio transcription not yet offered.
`tts-1`, `tts-1-hd`	(not available)	TTS not yet offered.

If your workload uses Whisper, TTS, or Realtime, keep those on OpenAI for now and use Epithre for the chat/embed/image pieces.

What's the same

The wire format is byte-identical. Every standard OpenAI parameter works:

messages, temperature, top_p, max_tokens, stream
tools, tool_choice (including "auto", "none", named tool choice)
response_format (both json_object and json_schema strict mode)
seed, stop, frequency_penalty, presence_penalty
SSE streaming chunks have the same shape (data: {...} lines, final [DONE])
Tool calls in messages have the same tool_calls / tool_call_id structure
Vision via content: [{type:"image_url", image_url:{url:"data:..."}}]
Embeddings response: data[].embedding, usage.prompt_tokens
Error envelope: {"error": {"message", "type", "code"}} with the same HTTP status codes

If your code uses OpenAI's official SDK in any language, it'll work without modification beyond the three lines above.

What's different

A handful of additions and quirks worth knowing:

1. Extended thinking is off by default

OpenAI's o1/o3 models default to internal reasoning. Epithre's epithre-omni and epithre-prme default to thinking off to keep latency predictable. Opt in:

resp = client.chat.completions.create(
    model="epithre-prme",
    messages=[...],
    extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)

When enabled, the model burns more tokens but produces visibly stronger reasoning on hard problems.

2. Prompt caching is explicit (Anthropic-style)

OpenAI auto-caches prefixes that repeat within a few minutes. Epithre uses an explicit marker so you control what's cached:

messages=[
    {"role": "system", "content": [
        {"type": "text",
         "text": "<long stable system prompt>",
         "cache_control": {"type": "ephemeral"}}
    ]},
    {"role": "user", "content": "..."},
]

Cached reads bill at 10% of input rate. See prompt caching guide.

3. Embeddings have an `instruction` field

Optional task instruction prefix that improves retrieval quality for specific domains:

resp = client.embeddings.create(
    model="epithre-embed",
    input=texts,
    extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)

OpenAI embeddings have no equivalent. The field is optional; omit it for OpenAI-identical behavior.

4. `epithre-embed` is multimodal in one vector space

OpenAI's text-embedding-3-* is text-only. epithre-embed accepts both text strings and image objects in the same call, producing vectors that are directly cosine-comparable. Concrete pattern:

import base64

img_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "anak kucing oren tidur di kasur",
        {"type": "image", "image": img_b64},   # extension to OpenAI shape
    ],
)
# Both vectors are 4000-dim, L2-normalized, cosine-comparable.

See multimodal guide and cross-modal RAG recipe.

5. Batch API uses the same shape, 50% discount

OpenAI's Batches API and Epithre's are wire-compatible. Migration is just the base URL change. Same JSONL input format (custom_id + method + url + body), same poll-or-webhook pattern, same 50% off list price.

6. Rate limits

OpenAI tier system (Tier 1, 2, ... 5) doesn't exist here. Per-key limits are: 60 RPM / 10K RPD / 10 concurrent / Rp1,000,000 monthly cap by default. Raise them in the dashboard.

7. Body size limits

OpenAI doesn't publish hard per-endpoint body size limits. Epithre's:

/v1/chat/completions, /v1/rerank, etc.: 1 MB default
/v1/embeddings: 25 MB (image inputs)
/v1/images/*, /v1/files: 50 MB

If you hit 413, see the error reference.

8. No fine-tuning yet

OpenAI offers self-serve fine-tuning. Epithre doesn't yet. For now: if you have a specific use case, email hello@epithre.com with the dataset description; we run custom training as a managed service while we build out the self-serve flow.

Pricing comparison

Approximate prices as of 2026-05. OpenAI prices change; check their pricing page for current.

Workload	OpenAI typical	Epithre
GPT-4o input	$2.50 / 1M tok	`epithre-omni` Rp7,000 / 1M tok
GPT-4o output	$10 / 1M tok	`epithre-omni` Rp25,000 / 1M tok
text-embedding-3-large	$0.13 / 1M tok	`epithre-embed` Rp1,500 / 1M tok
Batch discount	50%	50%
Prompt cache read	50% of input	10% of input

So roughly 70-80% cheaper for chat/embed at list price. The bigger savings often come from prompt caching read rate being 5x cheaper.

Working example: full migration of a RAG service

# BEFORE (OpenAI)
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def embed_corpus(docs):
    r = client.embeddings.create(model="text-embedding-3-small", input=docs)
    return [d.embedding for d in r.data]

def answer(question, contexts):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based on provided context."},
            {"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
        ],
    ).choices[0].message.content

# AFTER (Epithre) - 4 line changes total
from openai import OpenAI
client = OpenAI(
    api_key=os.environ["EPITHRE_KEY"],
    base_url="https://api.epithre.com/v1",
)

def embed_corpus(docs):
    r = client.embeddings.create(model="epithre-embed", input=docs, dimensions=1024)
    return [d.embedding for d in r.data]

def answer(question, contexts):
    return client.chat.completions.create(
        model="epithre-lyt",
        messages=[
            {"role": "system", "content": "Answer based on provided context."},
            {"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
        ],
    ).choices[0].message.content

For RAG, you'd typically also add a rerank step using epithre-rerank for ~95%+ retrieval quality. See the RAG cookbook.

Migration checklist

[ ] Sign up at platform.epithre.com, verify email, claim Rp50,000 credit.
[ ] Create a production API key. Store in your secret manager.
[ ] Change api_key + base_url in your client init.
[ ] Map model IDs: gpt-4o -> epithre-omni, text-embedding-3-* -> epithre-embed.
[ ] Run your existing eval suite against Epithre. Compare quality on Indonesian-heavy inputs.
[ ] (Optional) Replace OpenAI auto-prompt-cache with Epithre explicit cache_control on long system prompts.
[ ] (Optional) If you have Indonesian RAG corpus, re-embed it with epithre-embed for native-quality vectors. Existing OpenAI embeddings don't live in the same vector space.
[ ] Update monitoring: error type names are the same, but you may want to track Epithre-specific fields like usage.cache_read_input_tokens.
[ ] Update rate limit handling: same 429 pattern, but new per-key cap defaults.

That's it. Email hello@epithre.com if you hit any wall.