Tool use + function calling

Tool use (a.k.a. function calling) lets the model call functions in your code to fetch external data, perform actions, or invoke other services. Works on epithre-omni and epithre-prme via the standard tools schema.

Anatomy of a tool call

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city in Indonesia.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name in Indonesian"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"},
                },
                "required": ["city"],
            },
        },
    },
]

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[{"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"}],
    tools=tools,
    tool_choice="auto",  # default
)

If the model decides to call the tool, the response is:

resp.choices[0].message.tool_calls
# [{"id": "call_abc", "type": "function",
#   "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}}]

Your code executes the function, then sends the result back as a role: "tool" message:

import json

# Execute
tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
weather_result = your_weather_api(args["city"])  # e.g., "Cerah 31C"

# Continue the conversation with the result
followup = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"},
        resp.choices[0].message,  # assistant message with tool_calls
        {"role": "tool",
         "tool_call_id": tool_call.id,
         "content": weather_result},
    ],
    tools=tools,
)
print(followup.choices[0].message.content)
# "Cuaca di Jakarta hari ini cerah, sekitar 31C."

tool_choice options

Value Behavior
"auto" (default) Model decides whether to call a tool or just reply with text.
"none" Force text-only reply; tools are listed but won't be called.
"required" Model MUST call a tool. Reliable across all backends since May 2026.
{"type": "function", "function": {"name": "X"}} Force a specific tool. Works on all backends; preferred when you know the exact tool.

tool_choice="required" reliability (updated May 2026): the prior long-prompt stall is resolved across all backends — "required" now works regardless of prompt length on both epithre-omni and epithre-prme. "auto" retains slightly lower TTFT and remains preferred for latency-sensitive paths.

Multi-tool, multi-turn

The model can chain multiple tool calls. Example: a customer-support bot that needs to look up an order, then check shipping status.

tools = [
    {"type": "function", "function": {"name": "lookup_order", "description": "...",
                                      "parameters": {...}}},
    {"type": "function", "function": {"name": "check_shipping", "description": "...",
                                      "parameters": {...}}},
]

messages = [
    {"role": "system", "content": "Kamu CS bot. Pake tools yang ada untuk jawab pertanyaan."},
    {"role": "user", "content": "Status pesanan saya nomor ORD-12345?"},
]

# Loop until the model produces a final text response
for _ in range(5):  # safety cap on rounds
    resp = client.chat.completions.create(
        model="epithre-omni",
        messages=messages,
        tools=tools,
    )
    msg = resp.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        # Model produced a text answer; we're done
        print(msg.content)
        break

    # Execute each tool call and append results
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        if call.function.name == "lookup_order":
            result = lookup_order_db(args["order_id"])
        elif call.function.name == "check_shipping":
            result = shipping_api(args["tracking_number"])
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })

Cap the loop with a max iteration count. The model occasionally gets stuck in tool-call loops on ambiguous tasks.

Parallel tool calls

The model may emit multiple tool calls in one response when the user's question requires multiple lookups:

# User: "Cuaca Jakarta dan Surabaya hari ini?"
resp.choices[0].message.tool_calls
# [
#   {"id": "call_1", "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}},
#   {"id": "call_2", "function": {"name": "get_weather", "arguments": '{"city": "Surabaya"}'}},
# ]

Execute them in parallel (asyncio.gather or threadpool), then append both role: "tool" results before the next inference call.

Schema best practices

Common failure modes

Streaming with tool calls

You can stream a tool-using response. The tool call appears across multiple SSE delta chunks; accumulate them.

stream = client.chat.completions.create(
    model="epithre-omni",
    messages=[...],
    tools=tools,
    stream=True,
)

tool_call_accumulator = {}  # by index
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            slot = tool_call_accumulator.setdefault(idx, {"id": None, "name": "", "args": ""})
            if tc.id:
                slot["id"] = tc.id
            if tc.function.name:
                slot["name"] += tc.function.name
            if tc.function.arguments:
                slot["args"] += tc.function.arguments
    if delta.content:
        print(delta.content, end="")

# After stream ends, tool_call_accumulator has the full tool calls.

Most apps don't need streaming for tool calls; non-streaming is simpler. Stream only if your UX shows partial output while the model decides.

Cost considerations

Each tool-use round costs at minimum: prompt + tool definition tokens + tool result tokens + completion tokens. Multi-round chains can stack quickly. Two tactics to keep cost down:

  1. Trim tool definitions: only pass tools relevant to the current query. If your bot has 50 tools, dynamically include the 5 most likely.
  2. Cache the tool definitions in your system prompt via cache_control. Tools themselves don't yet hit the cache rate, but the system prompt around them does.
  3. Compress tool results: don't dump 100KB of raw JSON. Summarize / truncate before feeding back.