From Local LLM to Tool-Using Agent

从本地大模型到工具调用智能体

LLM Applications: From Local LLM to Tool-Using Agent. Using Gemma 4, Ollama, OpenAI Agents SDK, and Tavily MCP to build a lightweight research agent. 大模型应用：从本地大模型到工具调用智能体。使用 Gemma 4、Ollama、OpenAI Agents SDK 和 Tavily MCP 构建轻量级研究智能体。

You have just deployed a local LLM. Nice. But after the first few chats, you might be wondering: what else can I do with it? Well, how about making the local LLM agentic with some tool use? In this post, we’ll explore how to turn a local LLM into a tool-using agent. 你刚刚部署了一个本地大模型，很棒。但在最初的几次对话之后，你可能会想：我还能用它做什么呢？那么，通过工具调用让本地大模型具备“智能体”能力如何？在这篇文章中，我们将探讨如何将本地大模型转变为一个能够使用工具的智能体。

Specifically, we’ll use: 具体来说，我们将使用：

Gemma 4 model (edge-friendly variants) as our local LLM
Gemma 4 模型（边缘计算友好版本）作为本地大模型
Ollama for serving the local LLM
Ollama 作为本地大模型的服务框架
OpenAI Agents SDK for the agent runtime
OpenAI Agents SDK 作为智能体运行时
Tavily web search MCP as one example of the external tool
Tavily 网络搜索 MCP 作为外部工具的一个示例

We’ll build a mini deep research agent that can search the web, gather the evidence, and synthesize an answer with citations, given a user question. By the end of the post, you’d have a working local deep research agent and a reusable implementation pattern for turning a local model into a local AI agent. 我们将构建一个微型深度研究智能体，它能够根据用户的问题进行网络搜索、收集证据，并合成带有引用的答案。读完本文，你将拥有一个可运行的本地深度研究智能体，以及一套将本地模型转化为本地 AI 智能体的可复用实现模式。

1. Set Up the Local Agent Stack

1. 设置本地智能体技术栈

We need to prepare 4 pieces before we write the code: Ollama, Gemma 4 (specifically the Gemma 4 E4B model), OpenAI Agents SDK, and Tavily MCP. 在编写代码之前，我们需要准备四个部分：Ollama、Gemma 4（特别是 Gemma 4 E4B 模型）、OpenAI Agents SDK 和 Tavily MCP。

First, let’s install Ollama. On Windows, you can download the installer from the official Ollama website: https://ollama.com/download. Or use winget in PowerShell: winget install Ollama.Ollama. On Linux, Ollama can be installed with: curl -fsSL https://ollama.com/install.sh | sh. 首先，安装 Ollama。在 Windows 上，你可以从 Ollama 官网下载安装程序：https://ollama.com/download，或者在 PowerShell 中使用 winget install Ollama.Ollama。在 Linux 上，可以通过 curl -fsSL https://ollama.com/install.sh | sh 进行安装。

After installation, please check: ollama --version. On Windows, remember to launch Ollama from the Start menu. Once it is running, the local API endpoint is available. 安装完成后，请检查：ollama --version。在 Windows 上，记得从开始菜单启动 Ollama。一旦运行，本地 API 端点即可使用。

Next, we pull the local model. Here, we use Gemma 4 E4B variant: ollama pull gemma4:e4b. 接下来，拉取本地模型。这里我们使用 Gemma 4 E4B 版本：ollama pull gemma4:e4b。

Gemma 4 has several variants. The E4B model is a good fit for our purpose, as it is designed with edge/local agentic workflows in mind. My machine has an NVIDIA RTX 2000 Ada Laptop GPU with about 8 GB VRAM. If your machine is more constrained, you can try the lighter E2B variant: ollama pull gemma4:e2b. Gemma 4 有多个版本。E4B 模型非常适合我们的目的，因为它专为边缘/本地智能体工作流而设计。我的机器配备了 NVIDIA RTX 2000 Ada 笔记本 GPU，拥有约 8GB 显存。如果你的机器配置较低，可以尝试更轻量的 E2B 版本：ollama pull gemma4:e2b。

Next, we need the agent runtime library. For that, we use OpenAI Agents SDK: pip install openai-agents. You would also need the OpenAI-compatible client: pip install openai. 接下来，我们需要智能体运行时库。为此，我们使用 OpenAI Agents SDK：pip install openai-agents。你还需要兼容 OpenAI 的客户端：pip install openai。

Something to note here: later, we’ll point the client to Ollama’s local endpoint, so this does not mean we are sending model calls to OpenAI. 这里需要注意：稍后我们将把客户端指向 Ollama 的本地端点，因此这并不意味着我们将模型调用发送给 OpenAI。

Finally, we need a Tavily MCP endpoint. In case you have not used it before, Tavily is a search API designed for LLM applications. In this post, we use its MCP server so the agent can search the web. You’d need to first create a Tavily account and get an API key. On the Tavily platform, you can directly generate a MCP link with the following shape: https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>. 最后，我们需要一个 Tavily MCP 端点。如果你之前没用过，Tavily 是一个专为大模型应用设计的搜索 API。在本文中，我们使用它的 MCP 服务器，以便智能体能够搜索网络。你需要先创建一个 Tavily 账户并获取 API 密钥。在 Tavily 平台上，你可以直接生成如下格式的 MCP 链接：https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>。

2. Configure the Local Research Agent

2. 配置本地研究智能体

With OpenAI Agents SDK, this is the final Agent object we need to compose: 使用 OpenAI Agents SDK，我们需要构建最终的 Agent 对象如下：

from agents import Agent
agent = Agent(
    name="Local Research Agent",
    instructions=RESEARCH_AGENT_INSTRUCTIONS,
    model=model,
    mcp_servers=[tavily_server],
    mcp_config={"include_server_in_tool_names": True},
)

Let’s unpack each part. 让我们拆解每个部分。

2.1 The Model

2.1 模型

First, the model. 首先是模型。

from openai import AsyncOpenAI
from agents import OpenAIChatCompletionsModel

MODEL_NAME = "gemma4:e4b"
OLLAMA_BASE_URL = "http://localhost:11434/v1"

client = AsyncOpenAI(
    api_key="ollama",
    base_url=OLLAMA_BASE_URL,
)

model = OpenAIChatCompletionsModel(
    model=MODEL_NAME,
    openai_client=client,
)

We start by creating a client that points at Ollama’s local OpenAI-compatible endpoint. Then, we use OpenAIChatCompletionsModel to wrap the Gemma model into a model object. This allows the Agents SDK to use that model inside the agent loop. Note that the api_key="ollama" value is just a placeholder. Ollama doesn’t really need a real OpenAI API key. We use it because the client expects this field. 我们首先创建一个指向 Ollama 本地兼容 OpenAI 端点的客户端。然后，使用 OpenAIChatCompletionsModel 将 Gemma 模型封装成一个模型对象。这允许 Agents SDK 在智能体循环中使用该模型。注意，api_key="ollama" 只是一个占位符。Ollama 并不真正需要真实的 OpenAI API 密钥，我们使用它是因为客户端要求提供此字段。

2.2 The Instruction

2.2 指令

Next, we define the instruction for the agent with the desired research behavior: 接下来，我们为智能体定义所需研究行为的指令：

from datetime import datetime
CURRENT_DATE = datetime.now().strftime("%B %d, %Y")

RESEARCH_AGENT_INSTRUCTIONS = f"""
[Role] You are a concise research assistant.
[Task] Answer the user's question by turning it into a small web research task. Use the current date when interpreting time-sensitive questions: {CURRENT_DATE}.
[Research behavior] Start with one targeted search query. For recommendation or comparison questions, complete this research loop before answering: first identify the main options, then search for comparison context, then synthesize a recommendation. Use follow-up searches when the first results are insufficient, conflicting, or only cover part of the question. Prefer relevant and credible sources, and track which source supports each important claim. Before answering, check whether the gathered evidence is enough to support the conclusion.
[Expected output] Give a direct answer first, then briefly explain the evidence behind it. Include source links for key factual claims.
[Rules] Do not rely on memory for facts that may have changed. Do not invent...

[角色] 你是一位简洁的研究助手。 [任务] 通过将用户的问题转化为小型网络研究任务来回答问题。在解释时间敏感的问题时，请使用当前日期：{CURRENT_DATE}。 [研究行为] 从一个有针对性的搜索查询开始。对于推荐或比较类问题，在回答前完成此研究循环：首先确定主要选项，然后搜索比较背景，最后综合得出推荐。当初步结果不足、存在冲突或仅涵盖问题的一部分时，请使用后续搜索。优先选择相关且可信的来源，并追踪每个重要主张的来源。在回答之前，检查收集到的证据是否足以支持结论。 [预期输出] 先给出直接回答，然后简要解释背后的证据。为关键事实主张包含来源链接。 [规则] 不要依赖可能已发生变化的事实记忆。不要编造……