Tool use + function calling

Tool use (a.k.a. function calling) lets the model call functions in your code to fetch external data, perform actions, or invoke other services. Works on epithre-omni and epithre-prme via the standard tools schema.

Anatomy of a tool call

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city in Indonesia.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name in Indonesian"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"},
                },
                "required": ["city"],
            },
        },
    },
]

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[{"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"}],
    tools=tools,
    tool_choice="auto",  # default
)

If the model decides to call the tool, the response is:

resp.choices[0].message.tool_calls
# [{"id": "call_abc", "type": "function",
#   "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}}]

Your code executes the function, then sends the result back as a role: "tool" message:

import json

# Execute
tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
weather_result = your_weather_api(args["city"])  # e.g., "Cerah 31C"

# Continue the conversation with the result
followup = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"},
        resp.choices[0].message,  # assistant message with tool_calls
        {"role": "tool",
         "tool_call_id": tool_call.id,
         "content": weather_result},
    ],
    tools=tools,
)
print(followup.choices[0].message.content)
# "Cuaca di Jakarta hari ini cerah, sekitar 31C."

tool_choice options

Value	Behavior
`"auto"` (default)	Model decides whether to call a tool or just reply with text.
`"none"`	Force text-only reply; tools are listed but won't be called.
`"required"`	Model MUST call a tool. Reliable across all backends since May 2026.
`{"type": "function", "function": {"name": "X"}}`	Force a specific tool. Works on all backends; preferred when you know the exact tool.

tool_choice="required" reliability (updated May 2026): the prior long-prompt stall is resolved across all backends — "required" now works regardless of prompt length on both epithre-omni and epithre-prme. "auto" retains slightly lower TTFT and remains preferred for latency-sensitive paths.

Multi-tool, multi-turn

The model can chain multiple tool calls. Example: a customer-support bot that needs to look up an order, then check shipping status.

tools = [
    {"type": "function", "function": {"name": "lookup_order", "description": "...",
                                      "parameters": {...}}},
    {"type": "function", "function": {"name": "check_shipping", "description": "...",
                                      "parameters": {...}}},
]

messages = [
    {"role": "system", "content": "Kamu CS bot. Pake tools yang ada untuk jawab pertanyaan."},
    {"role": "user", "content": "Status pesanan saya nomor ORD-12345?"},
]

# Loop until the model produces a final text response
for _ in range(5):  # safety cap on rounds
    resp = client.chat.completions.create(
        model="epithre-omni",
        messages=messages,
        tools=tools,
    )
    msg = resp.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        # Model produced a text answer; we're done
        print(msg.content)
        break

    # Execute each tool call and append results
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        if call.function.name == "lookup_order":
            result = lookup_order_db(args["order_id"])
        elif call.function.name == "check_shipping":
            result = shipping_api(args["tracking_number"])
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })

Cap the loop with a max iteration count. The model occasionally gets stuck in tool-call loops on ambiguous tasks.

Parallel tool calls

The model may emit multiple tool calls in one response when the user's question requires multiple lookups:

# User: "Cuaca Jakarta dan Surabaya hari ini?"
resp.choices[0].message.tool_calls
# [
#   {"id": "call_1", "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}},
#   {"id": "call_2", "function": {"name": "get_weather", "arguments": '{"city": "Surabaya"}'}},
# ]

Execute them in parallel (asyncio.gather or threadpool), then append both role: "tool" results before the next inference call.

Schema best practices

Be specific in descriptions. "Get weather" is OK; "Get current weather for an Indonesian city by name" is better. The model reads descriptions to decide when to call.
Use enum for restricted values instead of plain string + describing in prose. The grammar layer will enforce.
required your required fields. The model is generally good at filling them but required makes the schema authoritative.
Tool names should be snake_case, descriptive verbs. get_weather, search_orders, send_email, not tool1 or helper.
Don't expose internal state as tools. Wrap your internals; expose intent-level functions.

Common failure modes

Model hallucinates arguments: passes a city that doesn't exist in your DB. Solution: validate inputs in your tool implementation, return a clear error result like {"error": "city not found"}. The model will often recover and ask the user for clarification.
Model loops calling the same tool: it asked once, got a result, asked again with the same args. Solution: in your handler, detect repeated identical calls and return a more directive message: {"note": "you already called this with the same args, the previous result was X"}.
JSON parsing of arguments fails: rare but happens with very complex schemas. Solution: try/except around json.loads(call.function.arguments) and feed the error back to the model: {"role": "tool", "content": "Argument parse failed: invalid JSON. Please retry with valid JSON."}.
Tool not called when expected: the model decided text-only was sufficient. Solution: lower system-prompt ambiguity ("ALWAYS use the search tool for product questions") or switch to forced named-tool choice.

Streaming with tool calls

You can stream a tool-using response. The tool call appears across multiple SSE delta chunks; accumulate them.

stream = client.chat.completions.create(
    model="epithre-omni",
    messages=[...],
    tools=tools,
    stream=True,
)

tool_call_accumulator = {}  # by index
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            slot = tool_call_accumulator.setdefault(idx, {"id": None, "name": "", "args": ""})
            if tc.id:
                slot["id"] = tc.id
            if tc.function.name:
                slot["name"] += tc.function.name
            if tc.function.arguments:
                slot["args"] += tc.function.arguments
    if delta.content:
        print(delta.content, end="")

# After stream ends, tool_call_accumulator has the full tool calls.

Most apps don't need streaming for tool calls; non-streaming is simpler. Stream only if your UX shows partial output while the model decides.

Cost considerations

Each tool-use round costs at minimum: prompt + tool definition tokens + tool result tokens + completion tokens. Multi-round chains can stack quickly. Two tactics to keep cost down:

Trim tool definitions: only pass tools relevant to the current query. If your bot has 50 tools, dynamically include the 5 most likely.
Cache the tool definitions in your system prompt via cache_control. Tools themselves don't yet hit the cache rate, but the system prompt around them does.
Compress tool results: don't dump 100KB of raw JSON. Summarize / truncate before feeding back.

Cookbook: function calling ** - more concrete patterns.
Cookbook: agentic chains - multi-step tool loops with retry/recovery.
Structured output guide - when to use response_format vs tool calling.