Agentic multi-step chains

Loop the model + tools until it produces a final answer. Pattern for customer support bots, research agents, codebase navigators.

Core loop

import json

def run_agent(user_message, tools, tool_handlers, max_iter=8):
    messages = [
        {"role": "system", "content": "You are a helpful assistant with tools."},
        {"role": "user", "content": user_message},
    ]

    for iteration in range(max_iter):
        resp = client.chat.completions.create(
            model="epithre-omni",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
        msg = resp.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            # Final answer
            return msg.content

        # Execute each tool call (potentially in parallel)
        for call in msg.tool_calls:
            try:
                args = json.loads(call.function.arguments)
                result = tool_handlers[call.function.name](**args)
                content = json.dumps(result) if not isinstance(result, str) else result
            except Exception as e:
                content = json.dumps({"error": str(e)})
            messages.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": content,
            })

    raise RuntimeError("Max iterations exceeded; agent didn't converge.")

Example: support agent with order lookup + shipping check

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order",
            "description": "Find order details by order number.",
            "parameters": {
                "type": "object",
                "properties": {"order_id": {"type": "string"}},
                "required": ["order_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_shipping",
            "description": "Get current shipping status for a tracking number.",
            "parameters": {
                "type": "object",
                "properties": {"tracking_number": {"type": "string"}},
                "required": ["tracking_number"],
            },
        },
    },
]

handlers = {
    "lookup_order":   lambda order_id: db.get_order(order_id),
    "check_shipping": lambda tracking_number: shipping_api.status(tracking_number),
}

answer = run_agent(
    "Status pesanan saya nomor ORD-12345 udah sampe mana?",
    tools, handlers,
)

The model: calls lookup_order(ORD-12345), gets tracking number, calls check_shipping(...), gets status, returns natural-language answer.

Recovery patterns

Bad arguments

The model may pass invalid args (typo, made-up ID). Detect and feed an error back; the model usually recovers:

def lookup_order(order_id):
    record = db.get_order(order_id)
    if not record:
        return {"error": f"Order {order_id} not found. Please re-check the order number."}
    return record

The model will see this error and either ask the user to confirm or try another approach.

Tool loop (same call twice in a row)

last_call = None
for iteration in range(max_iter):
    resp = client.chat.completions.create(...)
    msg = resp.choices[0].message

    if msg.tool_calls and len(msg.tool_calls) == 1:
        sig = (msg.tool_calls[0].function.name,
               msg.tool_calls[0].function.arguments)
        if sig == last_call:
            # Stuck. Force final answer.
            messages.append({"role": "user", "content":
                "You're repeating the same tool call. Please give your best answer based on what you have."})
            continue
        last_call = sig
    # ... rest of loop

Token budget

Long tool chains can blow up context. After each round, optionally trim or summarize older tool results:

if len(messages) > 20:
    # Summarize older history into a single context message
    older = messages[1:-10]  # keep system + last 10
    summary = summarize_older(older)
    messages = [messages[0], {"role": "system", "content": f"Earlier conversation summary: {summary}"}, *messages[-10:]]

When to use which model

epithre-omni: most agentic chains. Good tool-use, strong reasoning.
epithre-prme: long-context agents (e.g. codebase navigator with full file tree in context).
epithre-lyt: NOT recommended for tool loops; less reliable at multi-step tool reasoning.

Cost control

Cap iterations (8 is typical).
tool_choice="auto" has slightly lower TTFT and is preferred for cost-sensitive loops. The prior "required" stall on epithre-omni is fixed (May 2026) — "required" now works reliably across all backends.
Compress tool results: don't return 100KB JSON; summarize or truncate.
Use cache_control on the system prompt — at 5+ iterations per case this saves ~80% on the cached portion. Note: the top-level tools field isn't covered by markers; if you want tool-spec tokens cached, inline the tool documentation as text into the system prompt and keep tools for schema enforcement. See Pattern 4 in the prompt caching guide.