The OpenWebUI client retrieves available tools from the MCP server.
In this particular case, the MCP server is using FastAPI and responds with the Swagger JSON that OpenWebUI expects:
{ "openapi": "3.1.0", "info": { "title": "FastAPI", "version": "0.1.0" }, "paths": { "/get_weather": { "get": { "summary": "Get Weather", "parameters": [ { "name": "location", "in": "query", "required": true, "schema": { "type": "string", "description": "Location to retrieve weather for" } } ] } } } }
OpenWebUI constructs an OpenAI-format payload containing the user prompt and available tools.
User prompt: What's the weather in Austin, TX?
{ "model": "qwen-2.5:32b", "messages": [ { "role": "user", "content": "What's the weather in Austin, TX?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "Location to retrieve weather for" } }, "required": ["location"] } } } ], "tool_choice": "auto" }
Ollama receives the request on its OpenAI-compatible endpoint and formats the message using the chat template defined for the model.
In this case, I'm using a Qwen 2.5 32b - Instruct
model with a chat template that can be found here.
The chat template defined in the tokenizer_config.json
file is a Jinja template
{%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %}
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant. Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {"name": "get_weather", "description": "Get the current weather for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "Location to retrieve weather for"}}, "required": ["location"]}} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> <|im_end|> <|im_start|>user What's the weather in Austin, TX? <|im_end|> <|im_start|>assistant
The LLM analyzes the user request and determines that a tool call is required. It outputs the desired tool call within the XML tags as specified by the chat template:
I'll help you get the current weather in Austin, TX. <tool_call> {"name": "get_weather", "arguments": {"location": "Austin, TX"}} </tool_call>
Ollama parses the tool call block from the LLM output and formats the response back into OpenAI-compatible format with the tool calls array populated:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "qwen-2.5:32b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I'll help you get the current weather in Austin, TX.", "tool_calls": [ { "id": "get_weather_1", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Austin, TX\"}" } } ] }, "finish_reason": "tool_calls" } ] }
OpenWebUI receives Ollama's response, detects the tool call request, and makes an HTTP request to the MCP server using the endpoint and parameters matching the Swagger schema in step 0
:
POST /get_weather { "location": "Austin, TX" }
The MCP server performs whatever blackbox shenanigans to determine the weather in Austin, TX
and returns a response matching the Swagger schema from step 0
:
{ "temperature": 102.4, "location": "Austin, TX", "unit": "fahrenheit" }
OpenWebUI receives the MCP server response, adds it to the conversation history as a tool response, and sends the updated conversation back to Ollama:
{ "model": "qwen-2.5:32b", "messages": [ { "role": "user", "content": "What's the weather in Austin, TX?" }, { "role": "assistant", "content": "I'll help you get the current weather in Austin, TX.", "tool_calls": [ { "id": "get_weather_1", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Austin, TX\"}" } } ] }, { "role": "tool", "tool_call_id": "get_weather_1", "content": "{\"temperature\": 102.4, \"location\": \"Austin, TX\", \"unit\": \"fahrenheit\"}" } ] }
Ollama formats the messages including the new tool response according to the model's chat template.
tool
role, this particular model/chat template displays them in the chat under user
.
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant. Tools [... tool definitions ...] <|im_end|> <|im_start|>user What's the weather in Austin, TX? <|im_end|> <|im_start|>assistant I'll help you get the current weather in Austin, TX. <tool_call> {"name": "get_weather", "arguments": {"location": "Austin, TX"}} </tool_call> <|im_end|> <|im_start|>user {"temperature": 102.4, "location": "Austin, TX", "unit": "fahrenheit"} <|im_end|> <|im_start|>assistant
The LLM processes the formatted chat message with the tool response data and generates a natural language response using the weather information:
Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!
Ollama formats the LLM's natural language response back into OpenAI-compatible format for return to OpenWebUI:
{ "id": "chatcmpl-def456", "object": "chat.completion", "created": 1234567920, "model": "qwen-2.5:32b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!" }, "finish_reason": "stop" } ] }
OpenWebUI receives the final formatted response from Ollama and displays the natural language message to the user in the chat interface:
Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!