Tool use + function calling
Tool use (a.k.a. function calling) lets the model call functions in your code to fetch external data, perform actions, or invoke other services. Works on epithre-omni and epithre-prme via the standard tools schema.
Anatomy of a tool call
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city in Indonesia.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name in Indonesian"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"},
},
"required": ["city"],
},
},
},
]
resp = client.chat.completions.create(
model="epithre-omni",
messages=[{"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"}],
tools=tools,
tool_choice="auto", # default
)
If the model decides to call the tool, the response is:
resp.choices[0].message.tool_calls
# [{"id": "call_abc", "type": "function",
# "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}}]
Your code executes the function, then sends the result back as a role: "tool" message:
import json
# Execute
tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
weather_result = your_weather_api(args["city"]) # e.g., "Cerah 31C"
# Continue the conversation with the result
followup = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "user", "content": "Cuaca di Jakarta hari ini gimana?"},
resp.choices[0].message, # assistant message with tool_calls
{"role": "tool",
"tool_call_id": tool_call.id,
"content": weather_result},
],
tools=tools,
)
print(followup.choices[0].message.content)
# "Cuaca di Jakarta hari ini cerah, sekitar 31C."
tool_choice options
| Value | Behavior |
|---|---|
"auto" (default) |
Model decides whether to call a tool or just reply with text. |
"none" |
Force text-only reply; tools are listed but won't be called. |
"required" |
Model MUST call a tool. Reliable across all backends since May 2026. |
{"type": "function", "function": {"name": "X"}} |
Force a specific tool. Works on all backends; preferred when you know the exact tool. |
tool_choice="required" reliability (updated May 2026): the prior long-prompt stall is resolved across all backends — "required" now works regardless of prompt length on both epithre-omni and epithre-prme. "auto" retains slightly lower TTFT and remains preferred for latency-sensitive paths.
Multi-tool, multi-turn
The model can chain multiple tool calls. Example: a customer-support bot that needs to look up an order, then check shipping status.
tools = [
{"type": "function", "function": {"name": "lookup_order", "description": "...",
"parameters": {...}}},
{"type": "function", "function": {"name": "check_shipping", "description": "...",
"parameters": {...}}},
]
messages = [
{"role": "system", "content": "Kamu CS bot. Pake tools yang ada untuk jawab pertanyaan."},
{"role": "user", "content": "Status pesanan saya nomor ORD-12345?"},
]
# Loop until the model produces a final text response
for _ in range(5): # safety cap on rounds
resp = client.chat.completions.create(
model="epithre-omni",
messages=messages,
tools=tools,
)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
# Model produced a text answer; we're done
print(msg.content)
break
# Execute each tool call and append results
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
if call.function.name == "lookup_order":
result = lookup_order_db(args["order_id"])
elif call.function.name == "check_shipping":
result = shipping_api(args["tracking_number"])
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result),
})
Cap the loop with a max iteration count. The model occasionally gets stuck in tool-call loops on ambiguous tasks.
Parallel tool calls
The model may emit multiple tool calls in one response when the user's question requires multiple lookups:
# User: "Cuaca Jakarta dan Surabaya hari ini?"
resp.choices[0].message.tool_calls
# [
# {"id": "call_1", "function": {"name": "get_weather", "arguments": '{"city": "Jakarta"}'}},
# {"id": "call_2", "function": {"name": "get_weather", "arguments": '{"city": "Surabaya"}'}},
# ]
Execute them in parallel (asyncio.gather or threadpool), then append both role: "tool" results before the next inference call.
Schema best practices
- Be specific in descriptions. "Get weather" is OK; "Get current weather for an Indonesian city by name" is better. The model reads descriptions to decide when to call.
- Use
enumfor restricted values instead of plain string + describing in prose. The grammar layer will enforce. requiredyour required fields. The model is generally good at filling them butrequiredmakes the schema authoritative.- Tool names should be
snake_case, descriptive verbs.get_weather,search_orders,send_email, nottool1orhelper. - Don't expose internal state as tools. Wrap your internals; expose intent-level functions.
Common failure modes
- Model hallucinates arguments: passes a
citythat doesn't exist in your DB. Solution: validate inputs in your tool implementation, return a clear error result like{"error": "city not found"}. The model will often recover and ask the user for clarification. - Model loops calling the same tool: it asked once, got a result, asked again with the same args. Solution: in your handler, detect repeated identical calls and return a more directive message:
{"note": "you already called this with the same args, the previous result was X"}. - JSON parsing of arguments fails: rare but happens with very complex schemas. Solution: try/except around
json.loads(call.function.arguments)and feed the error back to the model:{"role": "tool", "content": "Argument parse failed: invalid JSON. Please retry with valid JSON."}. - Tool not called when expected: the model decided text-only was sufficient. Solution: lower system-prompt ambiguity ("ALWAYS use the search tool for product questions") or switch to forced named-tool choice.
Streaming with tool calls
You can stream a tool-using response. The tool call appears across multiple SSE delta chunks; accumulate them.
stream = client.chat.completions.create(
model="epithre-omni",
messages=[...],
tools=tools,
stream=True,
)
tool_call_accumulator = {} # by index
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
slot = tool_call_accumulator.setdefault(idx, {"id": None, "name": "", "args": ""})
if tc.id:
slot["id"] = tc.id
if tc.function.name:
slot["name"] += tc.function.name
if tc.function.arguments:
slot["args"] += tc.function.arguments
if delta.content:
print(delta.content, end="")
# After stream ends, tool_call_accumulator has the full tool calls.
Most apps don't need streaming for tool calls; non-streaming is simpler. Stream only if your UX shows partial output while the model decides.
Cost considerations
Each tool-use round costs at minimum: prompt + tool definition tokens + tool result tokens + completion tokens. Multi-round chains can stack quickly. Two tactics to keep cost down:
- Trim tool definitions: only pass tools relevant to the current query. If your bot has 50 tools, dynamically include the 5 most likely.
- Cache the tool definitions in your system prompt via
cache_control. Tools themselves don't yet hit the cache rate, but the system prompt around them does. - Compress tool results: don't dump 100KB of raw JSON. Summarize / truncate before feeding back.
Related
- Cookbook: function calling ** - more concrete patterns.
- Cookbook: agentic chains - multi-step tool loops with retry/recovery.
- Structured output guide - when to use response_format vs tool calling.