Migrate from OpenAI

Currency note: Epithre prices in IDR (Rupiah). Competitor prices left in USD for comparison reference.

Most OpenAI customers can migrate in under 5 minutes. The wire format is identical; only the base URL, API key, and model IDs change.

TL;DR

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
resp = client.chat.completions.create(model="gpt-4o", messages=[...])

# After
from openai import OpenAI
client = OpenAI(
    api_key="esk_live_...",
    base_url="https://api.epithre.com/v1",
)
resp = client.chat.completions.create(model="epithre-omni", messages=[...])

Three lines: api_key, base_url, model ID.

Model mapping

OpenAI model Epithre equivalent When to use which
gpt-4o, gpt-4.1 epithre-omni General multimodal chat. Same role: flagship default.
o1, o1-pro, o3-mini epithre-prme Long context (200K), reasoning. Use chat_template_kwargs={"enable_thinking": True} for explicit reasoning chains.
gpt-4o-mini, gpt-3.5-turbo epithre-lyt Fast, cheap. Same role: high-throughput cost-sensitive workloads.
text-embedding-3-large epithre-embed 4000-dim, MRL-truncatable. Plus image embed in the same vector space (cross-modal).
text-embedding-3-small epithre-embed + dimensions=1024 Truncate to 1024-dim for smaller storage.
dall-e-3 epithre-iris Text-to-image. Plus multi-reference editing.
gpt-4o-realtime (not available) Real-time audio not yet offered.
whisper-1 (not available) Audio transcription not yet offered.
tts-1, tts-1-hd (not available) TTS not yet offered.

If your workload uses Whisper, TTS, or Realtime, keep those on OpenAI for now and use Epithre for the chat/embed/image pieces.

What's the same

The wire format is byte-identical. Every standard OpenAI parameter works:

If your code uses OpenAI's official SDK in any language, it'll work without modification beyond the three lines above.

What's different

A handful of additions and quirks worth knowing:

1. Extended thinking is off by default

OpenAI's o1/o3 models default to internal reasoning. Epithre's epithre-omni and epithre-prme default to thinking off to keep latency predictable. Opt in:

resp = client.chat.completions.create(
    model="epithre-prme",
    messages=[...],
    extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)

When enabled, the model burns more tokens but produces visibly stronger reasoning on hard problems.

2. Prompt caching is explicit (Anthropic-style)

OpenAI auto-caches prefixes that repeat within a few minutes. Epithre uses an explicit marker so you control what's cached:

messages=[
    {"role": "system", "content": [
        {"type": "text",
         "text": "<long stable system prompt>",
         "cache_control": {"type": "ephemeral"}}
    ]},
    {"role": "user", "content": "..."},
]

Cached reads bill at 10% of input rate. See prompt caching guide.

3. Embeddings have an instruction field

Optional task instruction prefix that improves retrieval quality for specific domains:

resp = client.embeddings.create(
    model="epithre-embed",
    input=texts,
    extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)

OpenAI embeddings have no equivalent. The field is optional; omit it for OpenAI-identical behavior.

4. epithre-embed is multimodal in one vector space

OpenAI's text-embedding-3-* is text-only. epithre-embed accepts both text strings and image objects in the same call, producing vectors that are directly cosine-comparable. Concrete pattern:

import base64

img_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()

resp = client.embeddings.create(
    model="epithre-embed",
    input=[
        "anak kucing oren tidur di kasur",
        {"type": "image", "image": img_b64},   # extension to OpenAI shape
    ],
)
# Both vectors are 4000-dim, L2-normalized, cosine-comparable.

See multimodal guide and cross-modal RAG recipe.

5. Batch API uses the same shape, 50% discount

OpenAI's Batches API and Epithre's are wire-compatible. Migration is just the base URL change. Same JSONL input format (custom_id + method + url + body), same poll-or-webhook pattern, same 50% off list price.

6. Rate limits

OpenAI tier system (Tier 1, 2, ... 5) doesn't exist here. Per-key limits are: 60 RPM / 10K RPD / 10 concurrent / Rp1,000,000 monthly cap by default. Raise them in the dashboard.

7. Body size limits

OpenAI doesn't publish hard per-endpoint body size limits. Epithre's:

If you hit 413, see the error reference.

8. No fine-tuning yet

OpenAI offers self-serve fine-tuning. Epithre doesn't yet. For now: if you have a specific use case, email hello@epithre.com with the dataset description; we run custom training as a managed service while we build out the self-serve flow.

Pricing comparison

Approximate prices as of 2026-05. OpenAI prices change; check their pricing page for current.

Workload OpenAI typical Epithre
GPT-4o input $2.50 / 1M tok epithre-omni Rp7,000 / 1M tok
GPT-4o output $10 / 1M tok epithre-omni Rp25,000 / 1M tok
text-embedding-3-large $0.13 / 1M tok epithre-embed Rp1,500 / 1M tok
Batch discount 50% 50%
Prompt cache read 50% of input 10% of input

So roughly 70-80% cheaper for chat/embed at list price. The bigger savings often come from prompt caching read rate being 5x cheaper.

Working example: full migration of a RAG service

# BEFORE (OpenAI)
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def embed_corpus(docs):
    r = client.embeddings.create(model="text-embedding-3-small", input=docs)
    return [d.embedding for d in r.data]

def answer(question, contexts):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based on provided context."},
            {"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
        ],
    ).choices[0].message.content

# AFTER (Epithre) - 4 line changes total
from openai import OpenAI
client = OpenAI(
    api_key=os.environ["EPITHRE_KEY"],
    base_url="https://api.epithre.com/v1",
)

def embed_corpus(docs):
    r = client.embeddings.create(model="epithre-embed", input=docs, dimensions=1024)
    return [d.embedding for d in r.data]

def answer(question, contexts):
    return client.chat.completions.create(
        model="epithre-lyt",
        messages=[
            {"role": "system", "content": "Answer based on provided context."},
            {"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
        ],
    ).choices[0].message.content

For RAG, you'd typically also add a rerank step using epithre-rerank for ~95%+ retrieval quality. See the RAG cookbook.

Migration checklist

That's it. Email hello@epithre.com if you hit any wall.