Migrate from OpenAI
Currency note: Epithre prices in IDR (Rupiah). Competitor prices left in USD for comparison reference.
Most OpenAI customers can migrate in under 5 minutes. The wire format is identical; only the base URL, API key, and model IDs change.
TL;DR
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
resp = client.chat.completions.create(model="gpt-4o", messages=[...])
# After
from openai import OpenAI
client = OpenAI(
api_key="esk_live_...",
base_url="https://api.epithre.com/v1",
)
resp = client.chat.completions.create(model="epithre-omni", messages=[...])
Three lines: api_key, base_url, model ID.
Model mapping
| OpenAI model | Epithre equivalent | When to use which |
|---|---|---|
gpt-4o, gpt-4.1 |
epithre-omni |
General multimodal chat. Same role: flagship default. |
o1, o1-pro, o3-mini |
epithre-prme |
Long context (200K), reasoning. Use chat_template_kwargs={"enable_thinking": True} for explicit reasoning chains. |
gpt-4o-mini, gpt-3.5-turbo |
epithre-lyt |
Fast, cheap. Same role: high-throughput cost-sensitive workloads. |
text-embedding-3-large |
epithre-embed |
4000-dim, MRL-truncatable. Plus image embed in the same vector space (cross-modal). |
text-embedding-3-small |
epithre-embed + dimensions=1024 |
Truncate to 1024-dim for smaller storage. |
dall-e-3 |
epithre-iris |
Text-to-image. Plus multi-reference editing. |
gpt-4o-realtime |
(not available) | Real-time audio not yet offered. |
whisper-1 |
(not available) | Audio transcription not yet offered. |
tts-1, tts-1-hd |
(not available) | TTS not yet offered. |
If your workload uses Whisper, TTS, or Realtime, keep those on OpenAI for now and use Epithre for the chat/embed/image pieces.
What's the same
The wire format is byte-identical. Every standard OpenAI parameter works:
messages,temperature,top_p,max_tokens,streamtools,tool_choice(including"auto","none", named tool choice)response_format(bothjson_objectandjson_schemastrict mode)seed,stop,frequency_penalty,presence_penalty- SSE streaming chunks have the same shape (
data: {...}lines, final[DONE]) - Tool calls in messages have the same
tool_calls/tool_call_idstructure - Vision via
content: [{type:"image_url", image_url:{url:"data:..."}}] - Embeddings response:
data[].embedding,usage.prompt_tokens - Error envelope:
{"error": {"message", "type", "code"}}with the same HTTP status codes
If your code uses OpenAI's official SDK in any language, it'll work without modification beyond the three lines above.
What's different
A handful of additions and quirks worth knowing:
1. Extended thinking is off by default
OpenAI's o1/o3 models default to internal reasoning. Epithre's epithre-omni and epithre-prme default to thinking off to keep latency predictable. Opt in:
resp = client.chat.completions.create(
model="epithre-prme",
messages=[...],
extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)
When enabled, the model burns more tokens but produces visibly stronger reasoning on hard problems.
2. Prompt caching is explicit (Anthropic-style)
OpenAI auto-caches prefixes that repeat within a few minutes. Epithre uses an explicit marker so you control what's cached:
messages=[
{"role": "system", "content": [
{"type": "text",
"text": "<long stable system prompt>",
"cache_control": {"type": "ephemeral"}}
]},
{"role": "user", "content": "..."},
]
Cached reads bill at 10% of input rate. See prompt caching guide.
3. Embeddings have an instruction field
Optional task instruction prefix that improves retrieval quality for specific domains:
resp = client.embeddings.create(
model="epithre-embed",
input=texts,
extra_body={"instruction": "Given a legal query, retrieve relevant Indonesian regulations"},
)
OpenAI embeddings have no equivalent. The field is optional; omit it for OpenAI-identical behavior.
4. epithre-embed is multimodal in one vector space
OpenAI's text-embedding-3-* is text-only. epithre-embed accepts both text strings and image objects in the same call, producing vectors that are directly cosine-comparable. Concrete pattern:
import base64
img_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()
resp = client.embeddings.create(
model="epithre-embed",
input=[
"anak kucing oren tidur di kasur",
{"type": "image", "image": img_b64}, # extension to OpenAI shape
],
)
# Both vectors are 4000-dim, L2-normalized, cosine-comparable.
See multimodal guide and cross-modal RAG recipe.
5. Batch API uses the same shape, 50% discount
OpenAI's Batches API and Epithre's are wire-compatible. Migration is just the base URL change. Same JSONL input format (custom_id + method + url + body), same poll-or-webhook pattern, same 50% off list price.
6. Rate limits
OpenAI tier system (Tier 1, 2, ... 5) doesn't exist here. Per-key limits are: 60 RPM / 10K RPD / 10 concurrent / Rp1,000,000 monthly cap by default. Raise them in the dashboard.
7. Body size limits
OpenAI doesn't publish hard per-endpoint body size limits. Epithre's:
/v1/chat/completions,/v1/rerank, etc.: 1 MB default/v1/embeddings: 25 MB (image inputs)/v1/images/*,/v1/files: 50 MB
If you hit 413, see the error reference.
8. No fine-tuning yet
OpenAI offers self-serve fine-tuning. Epithre doesn't yet. For now: if you have a specific use case, email hello@epithre.com with the dataset description; we run custom training as a managed service while we build out the self-serve flow.
Pricing comparison
Approximate prices as of 2026-05. OpenAI prices change; check their pricing page for current.
| Workload | OpenAI typical | Epithre |
|---|---|---|
| GPT-4o input | $2.50 / 1M tok | epithre-omni Rp7,000 / 1M tok |
| GPT-4o output | $10 / 1M tok | epithre-omni Rp25,000 / 1M tok |
| text-embedding-3-large | $0.13 / 1M tok | epithre-embed Rp1,500 / 1M tok |
| Batch discount | 50% | 50% |
| Prompt cache read | 50% of input | 10% of input |
So roughly 70-80% cheaper for chat/embed at list price. The bigger savings often come from prompt caching read rate being 5x cheaper.
Working example: full migration of a RAG service
# BEFORE (OpenAI)
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def embed_corpus(docs):
r = client.embeddings.create(model="text-embedding-3-small", input=docs)
return [d.embedding for d in r.data]
def answer(question, contexts):
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer based on provided context."},
{"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
],
).choices[0].message.content
# AFTER (Epithre) - 4 line changes total
from openai import OpenAI
client = OpenAI(
api_key=os.environ["EPITHRE_KEY"],
base_url="https://api.epithre.com/v1",
)
def embed_corpus(docs):
r = client.embeddings.create(model="epithre-embed", input=docs, dimensions=1024)
return [d.embedding for d in r.data]
def answer(question, contexts):
return client.chat.completions.create(
model="epithre-lyt",
messages=[
{"role": "system", "content": "Answer based on provided context."},
{"role": "user", "content": f"Context: {contexts}\n\nQ: {question}"},
],
).choices[0].message.content
For RAG, you'd typically also add a rerank step using epithre-rerank for ~95%+ retrieval quality. See the RAG cookbook.
Migration checklist
- [ ] Sign up at platform.epithre.com, verify email, claim Rp50,000 credit.
- [ ] Create a production API key. Store in your secret manager.
- [ ] Change
api_key+base_urlin your client init. - [ ] Map model IDs:
gpt-4o->epithre-omni,text-embedding-3-*->epithre-embed. - [ ] Run your existing eval suite against Epithre. Compare quality on Indonesian-heavy inputs.
- [ ] (Optional) Replace OpenAI auto-prompt-cache with Epithre explicit
cache_controlon long system prompts. - [ ] (Optional) If you have Indonesian RAG corpus, re-embed it with
epithre-embedfor native-quality vectors. Existing OpenAI embeddings don't live in the same vector space. - [ ] Update monitoring: error type names are the same, but you may want to track Epithre-specific fields like
usage.cache_read_input_tokens. - [ ] Update rate limit handling: same 429 pattern, but new per-key cap defaults.
That's it. Email hello@epithre.com if you hit any wall.