Structured extraction from messy text
Use response_format with json_schema strict mode for reliable field extraction.
From chat messages
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"phone": {"type": "string", "pattern": r"^\+?62[0-9\-]+$"},
"city": {"type": "string"},
"order_qty": {"type": "integer", "minimum": 1},
"product": {"type": "string"},
},
"required": ["name", "phone", "product"],
"additionalProperties": False,
}
text = "Halo, gw Andi 081234567890. Mau pesan 5 sak pupuk urea dikirim ke Bekasi."
resp = client.chat.completions.create(
model="epithre-lyt",
messages=[
{"role": "system", "content":
"Ekstrak detail order dari pesan customer. Phone harus format +62 atau 0xxx."},
{"role": "user", "content": text},
],
response_format={"type": "json_schema",
"json_schema": {"name": "order", "strict": True, "schema": schema}},
)
From emails (with optional fields)
When some fields may not be in the input, mark them optional in schema and allow null:
schema = {
"type": "object",
"properties": {
"from_name": {"type": ["string", "null"]},
"from_email": {"type": "string"},
"subject": {"type": "string"},
"request_type": {"type": "string",
"enum": ["question", "complaint", "feature_request", "bug_report"]},
"urgency": {"type": "string", "enum": ["low", "medium", "high"]},
"summary": {"type": "string"},
},
"required": ["from_email", "subject", "request_type", "urgency", "summary"],
"additionalProperties": False,
}
From OCR output (KTP, KK, Akta)
Pair with Vision QA to first OCR the image, then extract. Or send the image directly to epithre-omni with vision and a schema; one call instead of two.
Multi-record extraction
When the input contains multiple records (e.g. a chat history with multiple orders):
schema = {
"type": "object",
"properties": {
"orders": {
"type": "array",
"items": {
"type": "object",
"properties": {
"customer": {"type": "string"},
"product": {"type": "string"},
"qty": {"type": "integer"},
"city": {"type": "string"},
},
"required": ["customer", "product", "qty"],
"additionalProperties": False,
}
}
},
"required": ["orders"],
"additionalProperties": False,
}
Handling extraction failures
The model occasionally can't fit the schema (e.g., required field missing in the input). With strict mode it may return finish_reason="length" or a partial response. Detect and handle:
resp = client.chat.completions.create(..., response_format=...)
choice = resp.choices[0]
if choice.finish_reason != "stop":
# Extraction failed; fall back to looser mode
fallback = client.chat.completions.create(
...,
response_format={"type": "json_object"}, # no schema
)
# Post-validate manually