Summarization with citations

Two summarization patterns: simple (just compress) and grounded (with citations). For production, you usually want grounded.

Simple summarization

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content":
            "Ringkas artikel berikut dalam 3 paragraf. Bahasa Indonesia formal."},
        {"role": "user", "content": article_text},
    ],
)
print(resp.choices[0].message.content)

For long articles (>30K tokens), use epithre-prme:

resp = client.chat.completions.create(
    model="epithre-prme",
    messages=[
        {"role": "system", "content": "Ringkas dokumen panjang dalam 5 poin kunci."},
        {"role": "user", "content": long_document},
    ],
)

Summarization with citations

The pattern: chunk the doc, label chunks, ask the model to cite which chunk(s) each sentence comes from.

# 1. Split into labeled chunks
chunks = split_into_chunks(article, max_chars=1500)
labeled = "\n\n".join(f"[{i+1}] {c}" for i, c in enumerate(chunks))

# 2. Ask for cited summary
resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content":
            "Ringkas dokumen ini. Setiap kalimat ringkasan harus diikuti "
            "dengan kutipan dalam kurung siku, contoh [1] atau [2,3]. "
            "Hanya gunakan informasi dari dokumen, jangan tambah pengetahuan luar."},
        {"role": "user", "content": labeled},
    ],
)

print(resp.choices[0].message.content)
# "Pemerintah menetapkan kebijakan baru terkait pajak [1]. Berdasarkan
#  data Q3, penerimaan pajak naik 12% YoY [2,4]. ..."

Citations as structured output

For machine parsing, structured output is cleaner:

schema = {
    "type": "object",
    "properties": {
        "summary_sentences": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "citations": {"type": "array", "items": {"type": "integer"}},
                },
                "required": ["text", "citations"],
                "additionalProperties": False,
            },
        },
        "key_topics": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["summary_sentences", "key_topics"],
    "additionalProperties": False,
}

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content":
            "Ringkas dokumen sebagai array kalimat. Tiap kalimat punya array citations berisi index chunk."},
        {"role": "user", "content": labeled},
    ],
    response_format={"type": "json_schema",
                     "json_schema": {"name": "summary", "strict": True, "schema": schema}},
)

import json
data = json.loads(resp.choices[0].message.content)
for s in data["summary_sentences"]:
    print(s["text"])
    print(f"  citing chunks: {s['citations']}")

Multi-doc summarization

When you have several related docs (e.g. news articles on the same event):

docs = [news1_text, news2_text, news3_text]
labeled = "\n\n".join(f"=== Dokumen {i+1} ===\n{d}" for i, d in enumerate(docs))

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content":
            "Ringkas beberapa dokumen tentang topik yang sama. Sertakan: "
            "(1) fakta yang konsisten di semua sumber, "
            "(2) perbedaan signifikan antar sumber, "
            "(3) hal yang hanya disebut satu sumber. "
            "Sertakan rujukan ke nomor dokumen."},
        {"role": "user", "content": labeled},
    ],
)

Use Batch API for cost savings:

import json
from datetime import date

today = date.today()
articles = fetch_today_news()  # your news source

with open("daily.jsonl", "w") as f:
    for i, article in enumerate(articles):
        f.write(json.dumps({
            "custom_id": f"news-{today}-{i}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "epithre-lyt",
                "messages": [
                    {"role": "system", "content": "Ringkas artikel dalam 2 kalimat."},
                    {"role": "user", "content": article},
                ],
            }
        }) + "\n")

# Submit and poll, or use webhook for batch.completed

50% off realtime. Most batches complete in minutes.

Common pitfalls

Hallucination of facts not in source: tell the model explicitly "Hanya gunakan informasi dari dokumen." For high-stakes, also pass temperature=0.1.
Length runaway: set max_tokens. A 1500-word article should summarize to 300-500 tokens, not 2000.
Citation drift: the model occasionally cites the wrong chunk number. For high-precision use, validate citations programmatically (post-process: check each cited chunk supports the sentence).