Translation pipeline: ID <-> EN
Translation is one place where Epithre's Indonesian tuning really shows. English-tuned models translate via English as an intermediate; Epithre models treat Indonesian as a first-class output language.
Simple one-shot translation
resp = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": "Translate Indonesian to English. Return only the translation, no commentary."},
{"role": "user", "content": "Jangan lupa makan malam, ya. Kalo kelar gawe, langsung pulang."},
],
)
# -> "Don't forget dinner. When you finish work, come straight home."
Preserve register
The hard part of translation is matching register. "Jangan lupa makan malam" is casual; the English should also be casual. Bad translation: "Please do not forget your evening meal."
Add register hint to system prompt:
system = ("Translate Indonesian to English. Match the register of the input: "
"if the input is casual, translate casually. If formal, translate formally. "
"Preserve cultural context where possible. Return only the translation.")
Batch translation
If you have many strings to translate, use the Batch API (50% off) plus structured output to keep results aligned to inputs.
import json
inputs = [
"Kami mohon maaf atas keterlambatan ini.",
"Bro, jadinya jadi gak nih?",
"Berdasarkan Pasal 4 ayat (1), tindakan tersebut tidak dapat dibenarkan.",
# ...thousands
]
with open("translate.jsonl", "w") as f:
for i, text in enumerate(inputs):
f.write(json.dumps({
"custom_id": f"trans-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "epithre-lyt",
"messages": [
{"role": "system", "content":
"Translate Indonesian to English. Match register. Output JSON with one field 'translation'."},
{"role": "user", "content": text},
],
"response_format": {"type": "json_object"},
}
}) + "\n")
input_file = client.files.create(file=open("translate.jsonl","rb"), purpose="batch_input")
batch = client.batches.create(
input_file_id=input_file.id,
endpoint="/v1/chat/completions",
)
# poll, download output, match by custom_id
epithre-lyt is sufficient for most translation. Use epithre-omni if your content needs domain-specific care (legal, medical).
Glossary / forbidden-terms
For technical content with specific terminology, supply a glossary:
glossary = """
TERMS (do not translate these, keep as-is):
- Pasal -> "Pasal" (not "Article" for Indonesian legal references)
- BPK -> "BPK" (Badan Pemeriksa Keuangan)
- UU -> "UU"
PREFERRED translations:
- penyebaran -> "deployment" (not "spread")
- kewajiban -> "obligation" (not "duty")
"""
resp = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": [
{"type": "text",
"text": "Translate Indonesian legal text to English.\n\n" + glossary,
"cache_control": {"type": "ephemeral"}}
]},
{"role": "user", "content": text},
],
)
The cache marker means the glossary is paid for once even across thousands of requests.
Reverse direction: EN -> ID
resp = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content":
"Translate English to Bahasa Indonesia. Use natural everyday phrasing "
"(not literal word-for-word). Match register: casual stays casual, formal stays formal."},
{"role": "user", "content":
"We've decided to move forward with the proposal but need to discuss the budget further."},
],
)
# -> "Kami sudah putuskan untuk lanjut dengan proposal ini tapi perlu diskusi lagi soal budget."
Streaming translation (live UX)
stream = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": "Translate ID to EN. Output only the translation."},
{"role": "user", "content": long_text},
],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Quality testing
Build a small gold-standard set (20-50 sentence pairs) covering your real use cases. Run periodically:
gold = [
("Apa kabar?", "How are you?"),
("Pak, kapan mulai meetingnya?", "Sir, when does the meeting start?"),
# ...
]
results = []
for src, expected in gold:
output = translate(src)
# use BLEU, rouge, or just LLM-as-judge
score = evaluate(output, expected)
results.append((src, expected, output, score))