Translation pipeline: ID <-> EN

Translation is one place where Epithre's Indonesian tuning really shows. English-tuned models translate via English as an intermediate; Epithre models treat Indonesian as a first-class output language.

Simple one-shot translation

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": "Translate Indonesian to English. Return only the translation, no commentary."},
        {"role": "user", "content": "Jangan lupa makan malam, ya. Kalo kelar gawe, langsung pulang."},
    ],
)
# -> "Don't forget dinner. When you finish work, come straight home."

Preserve register

The hard part of translation is matching register. "Jangan lupa makan malam" is casual; the English should also be casual. Bad translation: "Please do not forget your evening meal."

Add register hint to system prompt:

system = ("Translate Indonesian to English. Match the register of the input: "
          "if the input is casual, translate casually. If formal, translate formally. "
          "Preserve cultural context where possible. Return only the translation.")

Batch translation

If you have many strings to translate, use the Batch API (50% off) plus structured output to keep results aligned to inputs.

import json

inputs = [
    "Kami mohon maaf atas keterlambatan ini.",
    "Bro, jadinya jadi gak nih?",
    "Berdasarkan Pasal 4 ayat (1), tindakan tersebut tidak dapat dibenarkan.",
    # ...thousands
]

with open("translate.jsonl", "w") as f:
    for i, text in enumerate(inputs):
        f.write(json.dumps({
            "custom_id": f"trans-{i}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "epithre-lyt",
                "messages": [
                    {"role": "system", "content":
                        "Translate Indonesian to English. Match register. Output JSON with one field 'translation'."},
                    {"role": "user", "content": text},
                ],
                "response_format": {"type": "json_object"},
            }
        }) + "\n")

input_file = client.files.create(file=open("translate.jsonl","rb"), purpose="batch_input")
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
)

# poll, download output, match by custom_id

epithre-lyt is sufficient for most translation. Use epithre-omni if your content needs domain-specific care (legal, medical).

Glossary / forbidden-terms

For technical content with specific terminology, supply a glossary:

glossary = """
TERMS (do not translate these, keep as-is):
- Pasal -> "Pasal"  (not "Article" for Indonesian legal references)
- BPK -> "BPK" (Badan Pemeriksa Keuangan)
- UU -> "UU"

PREFERRED translations:
- penyebaran -> "deployment" (not "spread")
- kewajiban -> "obligation" (not "duty")
"""

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": [
            {"type": "text",
             "text": "Translate Indonesian legal text to English.\n\n" + glossary,
             "cache_control": {"type": "ephemeral"}}
        ]},
        {"role": "user", "content": text},
    ],
)

The cache marker means the glossary is paid for once even across thousands of requests.

Reverse direction: EN -> ID

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content":
            "Translate English to Bahasa Indonesia. Use natural everyday phrasing "
            "(not literal word-for-word). Match register: casual stays casual, formal stays formal."},
        {"role": "user", "content":
            "We've decided to move forward with the proposal but need to discuss the budget further."},
    ],
)
# -> "Kami sudah putuskan untuk lanjut dengan proposal ini tapi perlu diskusi lagi soal budget."

Streaming translation (live UX)

stream = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": "Translate ID to EN. Output only the translation."},
        {"role": "user", "content": long_text},
    ],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Quality testing

Build a small gold-standard set (20-50 sentence pairs) covering your real use cases. Run periodically:

gold = [
    ("Apa kabar?", "How are you?"),
    ("Pak, kapan mulai meetingnya?", "Sir, when does the meeting start?"),
    # ...
]

results = []
for src, expected in gold:
    output = translate(src)
    # use BLEU, rouge, or just LLM-as-judge
    score = evaluate(output, expected)
    results.append((src, expected, output, score))

See also