Legal document analysis (Bahasa Indonesia)
A demonstration of Epithre's strength on Indonesian legal text. We use:
- Knowledge upload to ingest statute PDFs into a per-customer index.
- Retrieval to find relevant pasal for a query.
- Chat with
epithre-omni(orprmefor long context) to synthesize a cited answer.
Step 1: ingest your statute corpus
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["EPITHRE_KEY"], base_url="https://api.epithre.com/v1")
# Upload PDFs of regulations
statute_files = [
"regulations/UU_41_1999_kehutanan.pdf",
"regulations/UU_32_2009_lingkungan_hidup.pdf",
"regulations/PP_23_2021_penyelenggaraan_kehutanan.pdf",
"regulations/Permen_LHK_92_2018_satwa.pdf",
]
file_ids = []
for path in statute_files:
r = client.files.create(file=open(path, "rb"), purpose="knowledge")
file_ids.append(r.id)
print(f"Uploaded {path} -> {r.id}")
# Poll until processed
import time
for fid in file_ids:
while True:
f = client.files.retrieve(fid)
if f.status == "processed":
break
if f.status == "error":
raise Exception(f"Processing failed for {fid}")
time.sleep(5)
Step 2: query
import httpx
question = "Apa hukuman pidana untuk perambahan hutan lindung tanpa izin?"
retrieved = httpx.post(
"https://api.epithre.com/v1/retrieval",
headers={"Authorization": f"Bearer {os.environ['EPITHRE_KEY']}"},
json={
"query": question,
"top_k": 10,
"file_ids": file_ids,
},
).json()["results"]
context = "\n\n".join(
f"[{h['file_id']} chunk {h['chunk_index']}] {h['text']}"
for h in retrieved
)
Step 3: synthesize a cited answer
resp = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": [
{"type": "text", "text":
("Kamu asisten hukum Indonesia. Jawab berdasarkan konteks "
"peraturan yang diberikan. WAJIB:\n"
"1. Sebutkan dasar hukum lengkap: nama UU/PP/Permen + nomor + tahun + Pasal + ayat.\n"
"2. Kutip persis isi pasal yang relevan.\n"
"3. Kalau konteks tidak cukup untuk menjawab, bilang dengan jujur.\n"
"4. Jangan ngarang. Jangan tambah dasar hukum yang tidak ada di konteks.\n\n"
"Format jawaban:\n"
"DASAR HUKUM: <kutipan pasal lengkap>\n"
"JAWABAN: <penjelasan dengan bahasa awam>\n"
"PERINGATAN: <kalau ada nuansa atau pengecualian>"),
"cache_control": {"type": "ephemeral"}}
]},
{"role": "user", "content":
f"KONTEKS PERATURAN:\n{context}\n\n"
f"PERTANYAAN: {question}"},
],
)
print(resp.choices[0].message.content)
Example output:
DASAR HUKUM:
- UU No. 41 Tahun 1999 tentang Kehutanan, Pasal 50 ayat (3) huruf e:
"Setiap orang dilarang melakukan kegiatan pertambangan tanpa izin Menteri di kawasan hutan."
- Pasal 78 ayat (5): "Barangsiapa dengan sengaja melanggar... diancam dengan
pidana penjara paling lama 10 (sepuluh) tahun dan denda paling banyak
Rp 5.000.000.000,00."
JAWABAN:
Perambahan hutan lindung tanpa izin dapat dipidana penjara hingga 10 tahun
dan denda hingga Rp 5 miliar berdasarkan UU Kehutanan...
PERINGATAN: Hukuman dapat lebih berat jika perbuatan dilakukan secara
terorganisir atau menyebabkan kerusakan lingkungan yang signifikan,
yang diatur dalam UU 32/2009 tentang PPLH.
Why this works well
epithre-embedis tuned on Indonesian legal text, so retrieval finds the right pasal even when the query phrasing differs.- The system prompt is anchored: model must cite, must say "tidak cukup" when context is thin, must not hallucinate.
- Cache marker on the system prompt means follow-up queries in the same session bill at 10% read rate on the framing.
epithre-omnihas strong reasoning over legal hierarchy (UU > PP > Permen) and ayat structure.
For long-document follow-ups
If the user asks a follow-up that needs additional context (e.g. "kalau pelakunya korporasi, gimana?"), you can either:
- Re-retrieve with the new query (broader context).
- OR pass the full statute text in one shot to
epithre-prme(180K context) for deep analysis.
# Full-document deep dive
resp = client.chat.completions.create(
model="epithre-prme",
messages=[
{"role": "system", "content": "Analisa hukum mendalam."},
{"role": "user", "content":
f"Berikut UU 41/1999 lengkap (50 halaman):\n\n{full_text}\n\n"
f"Pertanyaan: {complex_question}"},
],
extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)
With thinking enabled, epithre-prme can produce dense multi-pasal cross-references that match a junior lawyer's analysis quality.
Validating citations
Critical: the model occasionally hallucinates pasal numbers, especially for niche statutes. Always programmatically verify cited pasal exist in your corpus.
import re
citations = re.findall(r"Pasal\s+(\d+)[\w\s]*ayat\s*\((\d+)\)", model_output)
for pasal, ayat in citations:
# Verify in your statute index
if not statute_index.has(pasal, ayat):
log.warning(f"Citation Pasal {pasal} ayat ({ayat}) not in corpus, possible hallucination")