Migrate from Cohere
Currency note: Epithre prices in IDR (Rupiah). Competitor prices left in USD for comparison reference.
Cohere migration is split: our rerank endpoint is wire-compatible with Cohere's, so that part is just a base-URL change. Chat and embed require switching to OpenAI shape, which is more code change but mechanical.
Model mapping
| Cohere | Epithre | Notes |
|---|---|---|
command-r-plus, command-r |
epithre-omni |
General chat / RAG / tool use. |
command-light |
epithre-lyt |
Fast cheap chat. |
embed-multilingual-v3, embed-english-v3 |
epithre-embed |
4000-dim, multimodal. |
rerank-multilingual-v3, rerank-english-v3 |
epithre-rerank |
Cohere-wire-compatible. |
classify-* |
(no direct equivalent) | Use chat-completion with few-shot examples; see classification cookbook. |
Rerank (drop-in)
Cohere customers who use /rerank get the easiest migration: same request shape, same response shape, just a base URL change.
Cohere:
import cohere
co = cohere.Client(api_key="...")
r = co.rerank(
model="rerank-multilingual-v3.0",
query="hutan lindung",
documents=["UU 41/1999 ...", "Permen LHK ...", "resep nasi goreng"],
top_n=2,
return_documents=True,
)
Epithre:
import httpx
r = httpx.post(
"https://api.epithre.com/v1/rerank",
headers={"Authorization": f"Bearer {os.environ['EPITHRE_KEY']}"},
json={
"model": "epithre-rerank",
"query": "hutan lindung",
"documents": ["UU 41/1999 ...", "Permen LHK ...", "resep nasi goreng"],
"top_n": 2,
"return_documents": True,
},
).json()
# Same response shape: r["results"] = [{"index", "relevance_score", "document"?}, ...]
Optional instruction field for custom reranking criteria:
{"instruction": "Given a query about Indonesian environmental law, rank the most legally-authoritative documents first."}
Embed
Cohere's embed accepts input_type ("search_document", "search_query", "classification", "clustering"). Epithre's epithre-embed uses a free-form instruction field instead.
Cohere:
r = co.embed(
model="embed-multilingual-v3.0",
texts=texts,
input_type="search_document",
truncate="END",
)
vectors = r.embeddings
Epithre:
r = client.embeddings.create(
model="epithre-embed",
input=texts,
extra_body={"instruction": "Represent this document for retrieval:"},
)
vectors = [d.embedding for d in r.data]
Mapping input_type to instruction:
Cohere input_type |
Epithre instruction |
|---|---|
search_document |
"Represent this document for retrieval:" |
search_query |
"Represent this query for retrieving relevant documents:" |
classification |
"Represent this text for classification:" |
clustering |
"Represent this text for clustering similar items:" |
The truncate field also works in Epithre with different vocab: "END" (default, keep head), "START" (keep tail), "NONE" (return 422 if too long). Char-based not token-based.
Dimensions
Cohere embed returns 1024-dim by default for v3. Epithre epithre-embed returns 4000-dim. To get 1024-dim for compatibility:
r = client.embeddings.create(model="epithre-embed", input=texts, dimensions=1024)
The 1024-dim slice is the same first 1024 of the 4000-dim native, re-normalized. Lossless prefix per Matryoshka.
Chat
Cohere's chat differs significantly from OpenAI shape. Migrate to OpenAI-shape via the openai SDK.
Cohere:
r = co.chat(
model="command-r-plus",
message="Apa hukuman buat penebangan ilegal?",
chat_history=[{"role": "USER", "message": "..."}, {"role": "CHATBOT", "message": "..."}],
preamble="You are a legal assistant.",
documents=[{"title": "UU 41/1999", "snippet": "..."}],
)
Epithre:
r = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": "You are a legal assistant.\n\nContext:\n[UU 41/1999] ..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "Apa hukuman buat penebangan ilegal?"},
],
)
Cohere's documents parameter (with automatic grounding/citations) doesn't have a direct Epithre equivalent. The pattern: inline-include retrieved context in the system or user message, and either ask the model to cite explicitly or use retrieval-augmented patterns.
Pricing comparison
Approximate, 2026-05:
| Workload | Cohere typical | Epithre |
|---|---|---|
| Command-R+ input | $2.50 / 1M tok | epithre-omni Rp7,000 / 1M tok |
| Command-R+ output | $10 / 1M tok | epithre-omni Rp25,000 / 1M tok |
| Embed v3 | $0.10 / 1M tok | epithre-embed Rp1,500 / 1M tok |
| Rerank v3 | $2 / 1000 searches | epithre-rerank Rp5,000 / 1000 docs |
Rerank pricing scheme differs: Cohere charges per "search unit" (a single query). Epithre charges per document. With typical 10 docs per query, Cohere's $2/1000 ~= $200/1M docs, vs. Epithre's Rp5,000,000/1M docs (~$280/1M) — roughly comparable per-doc, but rerank in Epithre is structurally simpler since you can pre-batch hundreds of docs against one query.
What Cohere has that Epithre doesn't
- Classify endpoint as a first-class API. Use Epithre chat + few-shot for the same result; slightly more tokens but typically equivalent quality.
- Compass (multimodal embed + LLM combined product). Use
epithre-embed+epithre-omniseparately for the same flow. - Fine-tuning UI in their dashboard. Epithre offers via email for now.
What Epithre has that Cohere doesn't
- Image generation + editing (
epithre-iris). Cohere doesn't offer image generation. - Indonesian-first models. Cohere's
embed-multilingualis good but Epithre is purpose-built. - Prompt caching with 90% read discount.
- Data residency Jakarta.
Working example: RAG pipeline migration
# BEFORE (Cohere)
import cohere
co = cohere.Client(os.environ["COHERE_API_KEY"])
def index_docs(docs):
return co.embed(model="embed-multilingual-v3.0", texts=docs,
input_type="search_document").embeddings
def query(question, indexed_docs, doc_vectors):
qv = co.embed(model="embed-multilingual-v3.0", texts=[question],
input_type="search_query").embeddings[0]
# ... cosine search ...
candidates = [indexed_docs[i] for i in top_k_idx]
reranked = co.rerank(model="rerank-multilingual-v3.0",
query=question, documents=candidates,
top_n=3, return_documents=True)
context = [r.document.text for r in reranked.results]
return co.chat(model="command-r-plus",
message=question,
preamble="Answer based on context.",
documents=[{"title": f"src_{i}", "snippet": c}
for i, c in enumerate(context)]).text
# AFTER (Epithre)
from openai import OpenAI
import httpx
client = OpenAI(api_key=os.environ["EPITHRE_KEY"],
base_url="https://api.epithre.com/v1")
def index_docs(docs):
r = client.embeddings.create(
model="epithre-embed",
input=docs,
extra_body={"instruction": "Represent this document for retrieval:"},
)
return [d.embedding for d in r.data]
def query(question, indexed_docs, doc_vectors):
qv = client.embeddings.create(
model="epithre-embed", input=[question],
extra_body={"instruction": "Represent this query for retrieving relevant documents:"},
).data[0].embedding
# ... cosine search ...
candidates = [indexed_docs[i] for i in top_k_idx]
rerank_r = httpx.post(
"https://api.epithre.com/v1/rerank",
headers={"Authorization": f"Bearer {os.environ['EPITHRE_KEY']}"},
json={"model": "epithre-rerank", "query": question, "documents": candidates,
"top_n": 3, "return_documents": True},
).json()
context = [r["document"]["text"] for r in rerank_r["results"]]
return client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": "Answer based on the provided context."},
{"role": "user", "content": f"Context:\n{chr(10).join(context)}\n\nQ: {question}"},
],
).choices[0].message.content
Migration checklist
- [ ] Rerank: change URL, headers, keep same body shape. Zero-logic change.
- [ ] Embed: switch SDK or HTTP shape, map
input_typetoinstructiontext. - [ ] Chat: switch to OpenAI shape, replace
preamblewith system message, replacedocumentswith inline context. - [ ] If using
classify: re-implement as few-shot chat. See classification cookbook. - [ ] Re-embed corpus with
epithre-embedfor native-quality vectors. - [ ] Update rate-limit retry handling: same 429 pattern.
Email hello@epithre.com with subject "Cohere migration" if you hit unexpected differences.