Structured extraction from messy text

Use response_format with json_schema strict mode for reliable field extraction.

From chat messages

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "phone": {"type": "string", "pattern": r"^\+?62[0-9\-]+$"},
        "city": {"type": "string"},
        "order_qty": {"type": "integer", "minimum": 1},
        "product": {"type": "string"},
    },
    "required": ["name", "phone", "product"],
    "additionalProperties": False,
}

text = "Halo, gw Andi 081234567890. Mau pesan 5 sak pupuk urea dikirim ke Bekasi."

resp = client.chat.completions.create(
    model="epithre-lyt",
    messages=[
        {"role": "system", "content":
            "Ekstrak detail order dari pesan customer. Phone harus format +62 atau 0xxx."},
        {"role": "user", "content": text},
    ],
    response_format={"type": "json_schema",
                     "json_schema": {"name": "order", "strict": True, "schema": schema}},
)

From emails (with optional fields)

When some fields may not be in the input, mark them optional in schema and allow null:

schema = {
    "type": "object",
    "properties": {
        "from_name":   {"type": ["string", "null"]},
        "from_email":  {"type": "string"},
        "subject":     {"type": "string"},
        "request_type": {"type": "string",
                         "enum": ["question", "complaint", "feature_request", "bug_report"]},
        "urgency":     {"type": "string", "enum": ["low", "medium", "high"]},
        "summary":     {"type": "string"},
    },
    "required": ["from_email", "subject", "request_type", "urgency", "summary"],
    "additionalProperties": False,
}

From OCR output (KTP, KK, Akta)

Pair with Vision QA to first OCR the image, then extract. Or send the image directly to epithre-omni with vision and a schema; one call instead of two.

Multi-record extraction

When the input contains multiple records (e.g. a chat history with multiple orders):

schema = {
    "type": "object",
    "properties": {
        "orders": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "customer": {"type": "string"},
                    "product":  {"type": "string"},
                    "qty":      {"type": "integer"},
                    "city":     {"type": "string"},
                },
                "required": ["customer", "product", "qty"],
                "additionalProperties": False,
            }
        }
    },
    "required": ["orders"],
    "additionalProperties": False,
}

Handling extraction failures

The model occasionally can't fit the schema (e.g., required field missing in the input). With strict mode it may return finish_reason="length" or a partial response. Detect and handle:

resp = client.chat.completions.create(..., response_format=...)
choice = resp.choices[0]

if choice.finish_reason != "stop":
    # Extraction failed; fall back to looser mode
    fallback = client.chat.completions.create(
        ...,
        response_format={"type": "json_object"},  # no schema
    )
    # Post-validate manually

See also