AI Agents Explained: What Is a ReAct Loop and How Does It Work?
AI Agents Explained: What Is a ReAct Loop and How Does It Work?
AI 智能体详解:什么是 ReAct 循环及其工作原理?
How agents reason, act, and observe their way to a final answer, one step at a time. 智能体如何通过推理、行动和观察,一步步得出最终答案。
In my last post, we talked about Tool Calling. Tool Calling is the mechanism that allows an AI model to decide which function needs to be used and with what arguments, instead of just generating text as output. 在上一篇文章中,我们讨论了“工具调用”(Tool Calling)。工具调用是一种允许 AI 模型决定使用哪个函数以及使用什么参数的机制,而不是仅仅输出生成的文本。
By the end of that post, we had a setup that could decide to get_current_weather or convert_currency, or do both at once by calling them in parallel, or neither of them, and just generate text. In other words, the model decides what it needs to do next, we (the rest of the code) execute that decision, pass back the result to the model, and the model ultimately provides an informed answer to the user in text format.
在那篇文章的结尾,我们建立了一个系统,它可以决定调用 get_current_weather(获取当前天气)或 convert_currency(货币转换),或者通过并行调用同时执行两者,亦或两者都不调用而直接生成文本。换句话说,模型决定下一步要做什么,我们(其余代码)执行该决定,将结果传回给模型,最终模型以文本格式向用户提供基于事实的回答。
A more advanced version of this loop doesn’t stop after just one round of model deciding – code executing – passing back the result – model answering. Instead of generating a response at the end, the model can use the result of one tool call to decide whether, and which, tool to call next. 这种循环的一个更高级版本不会在“模型决策—代码执行—结果回传—模型回答”这一轮结束后就停止。模型不再是在最后生成响应,而是可以利用一次工具调用的结果来决定是否需要调用下一个工具,以及调用哪个工具。
As already mentioned at the end of the Tool Calling post, this is a ReAct loop (Reason + Act), and is exactly what lets agents handle tasks that can’t be solved in a single call. 正如在“工具调用”文章末尾所提到的,这就是 ReAct 循环(推理 + 行动),它正是让智能体能够处理那些无法通过单次调用解决的任务的关键。
But what would such a task be? In the previous post’s parallel calling example, we asked “What’s the weather in Athens and how much is 100 USD in EUR?”, which are two separate things requiring the use of two separate tools to obtain a response, but are also independent from one another. 那么,什么样的任务属于这种情况呢?在上一篇文章的并行调用示例中,我们问了“雅典的天气如何,以及 100 美元等于多少欧元?”,这是两件独立的事情,需要使用两个不同的工具来获取响应,且它们之间互不干扰。
In other words, we can answer those two questions independently, concurrently, without needing any information from the first question in order to reply to the second one. But what if we ask something like “I bet my friend 100 EUR that it would rain in Athens today. If I won, how many USD is that?” 换句话说,我们可以独立、并发地回答这两个问题,无需从第一个问题中获取任何信息来回答第二个问题。但如果我们问类似“我和朋友打赌 100 欧元,赌雅典今天会下雨。如果我赢了,那是多少美元?”这样的问题呢?
Here, the model won’t be able to decide if it needs to call convert_currency until it first calls get_current_weather and finds out whether it actually rained. Simply put, the answer to the second question depends entirely on the outcome of the first.
在这种情况下,模型在调用 get_current_weather 并确认是否真的下雨之前,无法决定是否需要调用 convert_currency。简而言之,第二个问题的答案完全取决于第一个问题的结果。
This is precisely the kind of dependency that parallel tool calling can’t resolve in one round, and exactly what a ReAct loop is built for. So, let’s take a look! 这正是并行工具调用无法在单轮内解决的依赖关系,也是 ReAct 循环专门为之构建的场景。那么,让我们一探究竟吧!
What exactly is a ReAct loop?
究竟什么是 ReAct 循环?
A ReAct loop is just three steps repeated in sequence: Reason, Act, Observe. ReAct 循环只是按顺序重复的三个步骤:推理(Reason)、行动(Act)、观察(Observe)。
At the beginning of the loop, the model reasons about what information it already knows and what additional information is missing in order to provide a correct response to the user’s query. It then acts by calling an appropriate tool with the purpose of obtaining this missing information. 在循环开始时,模型会推理它已知的信息,以及为了向用户提供正确回答还需要哪些额外信息。然后,它通过调用适当的工具来获取这些缺失的信息,从而采取行动。
Finally, once the respective tool call is executed and its result is passed back to the model, the model observes the result (adds the tool’s result into its context). Then, it loops back to reasoning again, except this time with this new observation sitting in its context. 最后,一旦相应的工具调用执行完毕并将结果传回给模型,模型就会观察该结果(将工具的输出添加到其上下文中)。接着,它会回到推理阶段,只不过这次它的上下文中已经包含了新的观察结果。
This loop is repeated until the model evaluates that the available information is enough for answering the user’s query, and at this point, it stops calling tools and just responds with text. 这个循环会不断重复,直到模型评估认为现有信息足以回答用户的问题,此时它将停止调用工具,直接以文本形式进行回复。
But isn’t this like the same as the tool calls we already know? Kind of, but not exactly. The part that makes this different from what we covered in the Tool Calling post is the loop itself. 但这和我们已经了解的工具调用不是一回事吗?有点像,但又不完全相同。它与我们在“工具调用”文章中讨论的内容的区别在于循环本身。
In a single tool call, the model asks for something, gets it, and that’s the end of the transaction as far as that call is concerned. In the ReAct loop, the conversation remains open, as each new observation becomes new context for the next reasoning step, and the model can change its plan based on what it just learned. 在单次工具调用中,模型请求某样东西,获取它,对于该次调用而言,交易就结束了。而在 ReAct 循环中,对话保持开放状态,因为每一次新的观察都会成为下一步推理的新上下文,模型可以根据刚刚学到的内容调整其计划。
Same Tools, New Trick
同样的工具,新的技巧
To make this concrete, let’s go back to the bet example from the intro and think through what the model actually needs to do in order to provide us a reliable answer. 为了具体说明这一点,让我们回到开篇的打赌示例,思考模型为了给我们提供可靠的答案,实际上需要做什么。
The question is: “I bet my friend 100 EUR that it would rain in Athens today. If I won, how many USD is that?” 问题是:“我和朋友打赌 100 欧元,赌雅典今天会下雨。如果我赢了,那是多少美元?”
Notice the conditional statement in the middle of it: “if I won.” Whether the model needs to convert any currency at all depends on what the weather call returns. If it rained, the model needs to call convert_currency with 100 EUR as an input parameter and give back the converted winnings. If it didn’t rain, the bet is lost, convert_currency is irrelevant, and the model should just directly return the respective text, without making a second call.
注意中间的条件语句:“如果我赢了”。模型是否需要进行货币转换,完全取决于天气查询的返回结果。如果下雨了,模型需要以 100 欧元作为输入参数调用 convert_currency 并返回转换后的奖金。如果没有下雨,则赌局输了,convert_currency 就无关紧要了,模型应该直接返回相应的文本,而无需进行第二次调用。
To put it differently, the model genuinely cannot plan its full sequence of tool calls upfront. It has to check the weather first, observe the result, reason about what that result implies for the bet condition, and only then decide whether a second tool call is needed. 换句话说,模型确实无法预先规划其完整的工具调用序列。它必须先检查天气,观察结果,推理该结果对打赌条件意味着什么,然后才能决定是否需要第二次工具调用。
Unlike the parallel tool calling that worked well for answering “What’s the weather in Athens and how much is 100 USD in EUR?”, this question requires a loop. The nice thing about a ReAct loop is that it doesn’t need new tools. We can still use the same functions, just in a different manner. 与回答“雅典天气如何,100 美元等于多少欧元?”时表现良好的并行工具调用不同,这个问题需要一个循环。ReAct 循环的好处在于它不需要新的工具。我们仍然可以使用相同的函数,只是以不同的方式使用它们。