Tool Calling, Explained: How AI Agents Decide What to Do Next
Tool Calling, Explained: How AI Agents Decide What to Do Next
工具调用详解:AI 智能体如何决定下一步行动
Agentic AI Tool Calling, Explained: How AI Agents Decide What to Do Next. Understanding how LLMs interact with the world around them, from returning data to taking action. 智能体 AI 工具调用详解:AI 智能体如何决定下一步行动。了解大语言模型(LLM)如何与周围世界交互,从返回数据到执行操作。
In my latest post, we talked about how to get structured, machine-readable outputs as a response from an LLM, using JSON Mode, function calling, and structured outputs. In that post, we briefly touched on the idea of function calling, approaching it as a method for obtaining structured responses. Nevertheless, function calling is something that goes well beyond just getting structured data back from a model, since it is essentially the backbone of agentic AI workflows. So, in today’s post, we are going to take a closer look at exactly this topic. 在我上一篇文章中,我们讨论了如何使用 JSON 模式、函数调用和结构化输出,从大语言模型(LLM)中获取结构化、机器可读的响应。在那篇文章中,我们简要探讨了函数调用的概念,将其视为获取结构化响应的一种方法。然而,函数调用的意义远不止于从模型中获取结构化数据,因为它本质上是智能体 AI 工作流的基石。因此,在今天的文章中,我们将深入探讨这一主题。
In all of the examples we have covered so far, the LLM is just used as a passive responder, meaning it receives a question and then generates an answer, and that’s it. But what if we want the LLM not just to respond with something but instead to do something? Or to put it more precisely, what if we want an action to be triggered based on the model’s response? This action may be anything: look up into live data, send a message, query a database, call an external API, and so on. This is made possible with tool calling. 在我们目前涵盖的所有示例中,LLM 仅被用作被动响应者,这意味着它接收一个问题然后生成一个答案,仅此而已。但如果我们不仅希望 LLM 回答问题,还希望它执行某些操作呢?或者更准确地说,如果我们希望根据模型的响应触发一个动作呢?这个动作可以是任何事情:查询实时数据、发送消息、查询数据库、调用外部 API 等等。这一切都可以通过“工具调用”(Tool Calling)来实现。
Tool calling is what transforms an LLM from a very smart text generator into something that can actually trigger actions and interact with the world around it. So, let’s take a look! 工具调用将 LLM 从一个非常聪明的文本生成器,转变为能够真正触发动作并与周围世界交互的实体。那么,让我们一探究竟吧!
What is Tool Calling?
什么是工具调用?
Tool calling (also called function calling) is the mechanism by which an LLM can request the execution of external functions or APIs as part of generating its response. In other words, instead of just returning text, the model can execute a specific function with specific arguments, as a response to the user’s request. 工具调用(也称为函数调用)是一种机制,通过该机制,LLM 可以在生成响应的过程中请求执行外部函数或 API。换句话说,模型不仅可以返回文本,还可以作为对用户请求的响应,执行带有特定参数的特定函数。
The key thing to understand here is that the model itself does not execute the tool. It only decides which tool to call and with what arguments. The actual execution of the selected tool happens in our own code, in which the request to the AI model is included. We then feed the tool’s result back to the AI model, which uses it to generate a final response to the user. 这里需要理解的关键点是:模型本身并不执行工具。它只负责决定调用哪个工具以及使用什么参数。所选工具的实际执行发生在我们的代码中(即包含对 AI 模型请求的代码)。然后,我们将工具的执行结果反馈给 AI 模型,模型再利用这些结果为用户生成最终响应。
This is the tool calling loop, which includes the following steps: 这就是工具调用循环,包含以下步骤:
- The user submits a message.
- 用户提交一条消息。
- The AI model takes the message as input and produces an output, which is essentially a decision on which tool to utilise and with which arguments.
- AI 模型将消息作为输入并产生输出,这本质上是关于“使用哪个工具”以及“使用什么参数”的决策。
- The model’s response containing the tool selection and respective arguments to be used is passed back to the code.
- 包含工具选择和相应参数的模型响应被传回代码。
- The code – with no involvement of the AI model – executes the selected tool with the selected arguments.
- 代码在没有 AI 模型参与的情况下,使用选定的参数执行所选工具。
- This execution produces some kind of result (e.g., a calculation, information obtained from an API, etc.), and this result is then passed back to the AI model.
- 执行产生某种结果(例如计算结果、从 API 获取的信息等),然后将该结果传回给 AI 模型。
- The AI model takes as input the result of the tool and produces a final response to the user based on that.
- AI 模型将工具的结果作为输入,并据此向用户生成最终响应。
Again, the model generates a tool call, not a tool execution. The two are very different things, and conflating them is one of the most common sources of confusion. 再次强调,模型生成的是“工具调用”,而不是“工具执行”。这两者有很大区别,将它们混为一谈是导致困惑的最常见原因之一。
1. A single tool: weather API
1. 单个工具:天气 API
I think that the most common example of tool use with AI that comes to mind is a weather API (the cornerstone of custom, live data), so let’s imagine we’re building a weather assistant. In particular, we want to create a mechanism in which the user asks about the weather, and instead of just letting the AI model make something up (which the model would very happily do 🙃), we want it to call a real weather function and get actual data about the weather from somewhere else, outside the LLM. 我认为 AI 工具使用中最常见的例子是天气 API(这是自定义实时数据的基石),所以让我们想象一下我们正在构建一个天气助手。具体来说,我们想要创建一个机制:当用户询问天气时,我们不希望 AI 模型胡编乱造(模型非常乐意这样做 🙃),而是希望它调用一个真实的天气函数,并从 LLM 之外的其他地方获取实际的天气数据。
To get the weather data, I will be using Open-Meteo, a free, open-source weather API that happily requires no API key. To use a tool, we have to initially declare it in tools.
为了获取天气数据,我将使用 Open-Meteo,这是一个免费、开源的天气 API,不需要 API 密钥。要使用工具,我们必须首先在 tools 中声明它。
from openai import OpenAI
import json
client = OpenAI(api_key="your_api_key")
# Step 1: define the tool
# 第一步:定义工具
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city, e.g. Athens"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use"
}
},
"required": ["city"]
}
}
}
]
Notice how the actual tool to be used (the weather API) is mentioned nowhere up to this point. Instead, the model decides which tool to call based on three things: the function description (“Get the current weather for a given city”), the parameter descriptions (“The name of the city, e.g., Athens”), and the enforced schema. It is purely from this information that the model figures out whether this is the right tool to call for a given user message and with what arguments. 请注意,到目前为止,实际要使用的工具(天气 API)在任何地方都没有被提及。相反,模型是基于三件事来决定调用哪个工具的:函数描述(“获取给定城市当前天气”)、参数描述(“城市名称,例如雅典”)以及强制执行的模式(schema)。模型完全是根据这些信息来判断这是否是针对给定用户消息的正确调用工具,以及应该使用什么参数。
Thus, writing clear and accurate descriptions when defining our tools is of key importance for the model to successfully identify and call the right tool based on the user’s input. So, after we have defined the tools variable, we can then make a request to the AI model:
因此,在定义工具时编写清晰准确的描述至关重要,这决定了模型能否根据用户的输入成功识别并调用正确的工具。因此,在定义好 tools 变量后,我们就可以向 AI 模型发起请求了:
# Step 2: send the user message along with the tool definition
# 第二步:发送用户消息以及工具定义
messages = [
{"role": "user", "content": "What's the weather like in Athens right now?"}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
tools=tools,
messages=messages
)
print(response.choices[0])