This document outlines the complete flow of a tool call from a client application through Ollama and back.

Note: I will refer to Open API v3 as Swagger to avoid confusing with Open AI.

Click a green circle in the diagram or step in the right pane.
LLM
OpenWebUI
Ollama
Ollama
MCP
OpenWebUI
OpenWebUI
Ollama
LLM
Ollama
OpenWebUI
MCP
1
0
2
3
5
7
8
9
11
Client
Client
Inference Engine
Inference Engine
4
10
6

0Tool Discovery

The OpenWebUI client retrieves available tools from the MCP server.
In this particular case, the MCP server is using FastAPI and responds with the Swagger JSON that OpenWebUI expects:

{
  "openapi": "3.1.0",
  "info": {
    "title": "FastAPI",
    "version": "0.1.0"
  },
  "paths": {
    "/get_weather": {
      "get": {
        "summary": "Get Weather",
        "parameters": [
          {
            "name": "location",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string",
              "description": "Location to retrieve weather for"
            }
          }
        ]
      }
    }
  }
}

1User Request with Tool Availability

OpenWebUI constructs an OpenAI-format payload containing the user prompt and available tools.
User prompt: What's the weather in Austin, TX?

{
  "model": "qwen-2.5:32b",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Austin, TX?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "Location to retrieve weather for"
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

2Ollama Message Formatting

Ollama receives the request on its OpenAI-compatible endpoint and formats the message using the chat template defined for the model.
In this case, I'm using a Qwen 2.5 32b - Instruct model with a chat template that can be found here.
The chat template defined in the tokenizer_config.json file is a Jinja template

Note: The above Qwen model's chat template can handle both top-level tool definitions and OpenAI's nested "function" structure:
{%- if tool_call.function is defined %}
    {%- set tool_call = tool_call.function %}
{%- endif %}

The parsed chat template would roughly resemble:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:

<tools>
{"name": "get_weather", "description": "Get the current weather for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "Location to retrieve weather for"}}, "required": ["location"]}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

<|im_end|>
<|im_start|>user
What's the weather in Austin, TX?
<|im_end|>
<|im_start|>assistant

3LLM Tool Call Decision

The LLM analyzes the user request and determines that a tool call is required. It outputs the desired tool call within the XML tags as specified by the chat template:

I'll help you get the current weather in Austin, TX.

<tool_call>
{"name": "get_weather", "arguments": {"location": "Austin, TX"}}
</tool_call>

4Ollama Response Formatting

Ollama parses the tool call block from the LLM output and formats the response back into OpenAI-compatible format with the tool calls array populated:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen-2.5:32b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'll help you get the current weather in Austin, TX.",
        "tool_calls": [
          {
            "id": "get_weather_1",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Austin, TX\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

5MCP Server Call

OpenWebUI receives Ollama's response, detects the tool call request, and makes an HTTP request to the MCP server using the endpoint and parameters matching the Swagger schema in step 0:

POST /get_weather
{
  "location": "Austin, TX"
}

6MCP Server Response

The MCP server performs whatever blackbox shenanigans to determine the weather in Austin, TX and returns a response matching the Swagger schema from step 0:

{
  "temperature": 102.4,
  "location": "Austin, TX",
  "unit": "fahrenheit"
}

7Tool Response

OpenWebUI receives the MCP server response, adds it to the conversation history as a tool response, and sends the updated conversation back to Ollama:

{
  "model": "qwen-2.5:32b",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Austin, TX?"
    },
    {
      "role": "assistant",
      "content": "I'll help you get the current weather in Austin, TX.",
      "tool_calls": [
        {
          "id": "get_weather_1",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"Austin, TX\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "get_weather_1",
      "content": "{\"temperature\": 102.4, \"location\": \"Austin, TX\", \"unit\": \"fahrenheit\"}"
    }
  ]
}

8Ollama Tool Response Formatting

Ollama formats the messages including the new tool response according to the model's chat template.

Note: Although the input at this stage has the tool responses under the tool role, this particular model/chat template displays them in the chat under user.
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Tools
[... tool definitions ...]

<|im_end|>
<|im_start|>user
What's the weather in Austin, TX?
<|im_end|>
<|im_start|>assistant
I'll help you get the current weather in Austin, TX.

<tool_call>
{"name": "get_weather", "arguments": {"location": "Austin, TX"}}
</tool_call>
<|im_end|>
<|im_start|>user
{"temperature": 102.4, "location": "Austin, TX", "unit": "fahrenheit"}
<|im_end|>
<|im_start|>assistant

9LLM Final Response

The LLM processes the formatted chat message with the tool response data and generates a natural language response using the weather information:

Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!

10Ollama Final Response Formatting

Ollama formats the LLM's natural language response back into OpenAI-compatible format for return to OpenWebUI:

{
  "id": "chatcmpl-def456",
  "object": "chat.completion",
  "created": 1234567920,
  "model": "qwen-2.5:32b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!"
      },
      "finish_reason": "stop"
    }
  ]
}

11User-Facing Response

OpenWebUI receives the final formatted response from Ollama and displays the natural language message to the user in the chat interface:

Based on the current weather data, it's quite hot in Austin, TX right now with a temperature of 102.4°F. Make sure to stay hydrated and seek air conditioning if you're planning to be outside!