The Protocol That Cleaned Up Our Agent Architecture
The Protocol That Cleaned Up Our Agent Architecture
优化智能体架构的协议:MCP
A few weeks ago someone from the data team asked whether we could update the database schema which was being populated by one of the tools of our complex agentic system. The update is simple: two new columns are being added to the table. The tool definition lived in the agent orchestrator. A second similar version of it lived in the validation agent. A third slightly different and out-of-date version was in a utility module someone had written three sprints ago. The human-in-the-loop approval logic was wired directly into graph edges, one custom implementation per tool. Changing the schema meant touching four files, re-testing each agent separately, and hoping nothing downstream broke silently. We fixed it but it raised one serious question: why did we build it this way?
几周前,数据团队有人询问我们是否可以更新数据库架构,该架构由我们复杂智能体系统中的一个工具负责填充。更新很简单:在表中增加两列。然而,工具定义分散在各处:一份在智能体编排器中,第二份类似的定义在验证智能体中,第三份略有不同且已过时的版本则存在于三个迭代前某人编写的工具模块中。人工介入(Human-in-the-loop)的审批逻辑直接硬编码在图的边(graph edges)中,每个工具都有一个自定义实现。修改架构意味着需要改动四个文件,分别重新测试每个智能体,并祈祷下游不会出现静默故障。我们虽然修复了问题,但引发了一个严肃的疑问:我们为什么要这样构建系统?
The honest answer is that we had no alternative. Tool calling in LangGraph is a local concern by design. You define tools where you need them, you call them where you call them and you own all the plumbing. This is manageable when you have only two agents but this becomes a problem when seven agents are sharing overlapping tools with a human gate. After doing some research we decided that instead of defining tools locally for every agent we should use a shared resource that can host all our tools and any agent can use them.
诚实的回答是:我们别无选择。LangGraph 中的工具调用在设计上属于本地事务。你在需要的地方定义工具,在需要的地方调用它们,并负责所有的底层逻辑。当只有两个智能体时,这尚可管理;但当七个智能体共享重叠的工具并需要人工把关时,这就成了问题。经过研究,我们决定不再为每个智能体本地定义工具,而是使用一个共享资源来托管所有工具,供任何智能体调用。
What is MCP?
什么是 MCP?
The Model Context Protocol is an open standard published by Anthropic in late 2024. It standardises how an AI agent discovers and calls tools. Instead of defining tools inside the orchestrator you run them on a separate server. The agent connects to that server at runtime, asks what tools are available, and gets a list back.
Model Context Protocol (MCP) 是 Anthropic 在 2024 年底发布的一项开放标准。它标准化了 AI 智能体发现和调用工具的方式。你不再需要在编排器内部定义工具,而是将它们运行在独立的服务器上。智能体在运行时连接到该服务器,询问有哪些可用工具,并获取列表。
A senior engineer reading this article will immediately ask: couldn’t I just build a centralised tool registry and inject it into each agent at startup? I asked this to myself and used the tool registry instead of MCP in another system. Yes, you could, and if you already have something like that working, MCP is not an emergency. What a bespoke registry doesn’t give you is the interoperability boundary. MCP is a protocol, not a library. Any MCP-compatible client can connect to your server, LangGraph today, a different framework next year. A TypeScript client can call your Python server without any extra integration work. A tool registry doesn’t provide this functionality. There’s also a team ownership point. In our case the ML team owned the tools, the application team owned the graph. MCP gave them a clean contract without a shared codebase.
阅读本文的资深工程师会立即问:我难道不能构建一个集中的工具注册中心,并在启动时将其注入每个智能体吗?我也曾这样问过自己,并在另一个系统中使用了工具注册中心而非 MCP。是的,你可以这样做;如果你已经有类似系统在运行,那么 MCP 并非当务之急。但定制的注册中心无法提供的是“互操作性边界”。MCP 是一个协议,而不是一个库。任何兼容 MCP 的客户端都可以连接到你的服务器——无论是今天的 LangGraph,还是明年的其他框架。TypeScript 客户端无需任何额外集成工作即可调用你的 Python 服务器。工具注册中心无法提供这种功能。此外,还有团队所有权的问题。在我们的案例中,机器学习团队负责工具,应用团队负责图逻辑。MCP 为他们提供了一个清晰的契约,而无需共享代码库。
Building the MCP Server
构建 MCP 服务器
An MCP server can expose three things: Tools (callable actions), Resources (read-only data), and Prompts (reusable templates). For an agentic system that needs to take some actions, tools are the primary concern. The Python SDK ships with FastMCP, which handles schema generation from type hints and manages protocol lifecycle. You have to write a function and decorate it with a tool decorator and the server takes care of the rest. One thing that catches people out with stdio transport: never write to stdout. The MCP protocol uses stdout as its communication channel. Any stray print() call will corrupt the message stream in ways that are very confusing to debug.
MCP 服务器可以暴露三类内容:工具(可调用的操作)、资源(只读数据)和提示词(可重用的模板)。对于需要执行操作的智能体系统而言,工具是核心。Python SDK 附带了 FastMCP,它能根据类型提示(type hints)自动生成架构并管理协议生命周期。你只需编写函数并添加工具装饰器,服务器就会处理其余工作。使用 stdio 传输时有一个容易踩的坑:千万不要向 stdout 写入内容。MCP 协议使用 stdout 作为通信通道,任何随意的 print() 调用都会破坏消息流,导致极难调试的问题。
Stdio vs HTTP
Stdio 与 HTTP
This decision comes up in every production deployment and most articles skip over it. Stdio runs the server as a subprocess of the client. Communication happens over standard input and output. Latency is single-digit milliseconds, there’s no network involved, and setup is minimal. The right choice for local development, single-machine deployments, or anywhere the server and client live in the same process tree.
这个决策出现在每一个生产部署中,但大多数文章都略过了它。Stdio 将服务器作为客户端的子进程运行,通过标准输入输出进行通信。延迟在个位数毫秒级,不涉及网络,且设置极其简单。它是本地开发、单机部署或服务器与客户端处于同一进程树中的理想选择。
Streamable HTTP runs the server as an independent service. Use this when the server needs to be shared across multiple clients or machines, when you want to deploy it as a container, or when you need horizontal scaling. Serverless deployments like Cloud Run work well here. Stdio doesn’t fit the serverless model at all because it assumes a long-lived parent process. Switching between these in FastMCP is just one line: mcp.run(transport="streamable-http", host="0.0.0.0", port=8080). We just have to change the transport in mcp.run() and everything else remains the same.
Streamable HTTP 则将服务器作为独立服务运行。当服务器需要跨多个客户端或机器共享、需要容器化部署或需要水平扩展时,请使用此模式。Cloud Run 等 Serverless 部署非常适合这种情况。Stdio 完全不适合 Serverless 模型,因为它假设存在一个长生命周期的父进程。在 FastMCP 中切换这两者只需一行代码:mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)。我们只需更改 mcp.run() 中的传输方式,其余代码保持不变。