One Flexible Tool Beats a Hundred Dedicated Ones

一个灵活的工具胜过一百个专用工具

Agentic AI: Why MCP servers keep losing to CLIs once the agent gets a terminal 智能体 AI：当智能体拥有终端后，为什么 MCP 服务器总是输给 CLI

The default move when you wanted an LLM agent to talk to a system at the start of 2026 was to install an MCP server for it. GitHub, Jira, Slack, Linear, Postgres, Neo4j—each one ships a server that exposes a tidy menu of tools: create_issue, list_pull_requests, merge_pull_request, get_repository, search_code, and so on. You point your agent at it, and it’s a great onboarding experience. It’s also, for a surprising number of real workloads, the wrong shape. 在 2026 年初，当你想要让 LLM 智能体与系统交互时，默认的做法是为它安装一个 MCP 服务器。GitHub、Jira、Slack、Linear、Postgres、Neo4j 等，每一个都提供了一个服务器，暴露出一份整洁的工具菜单：create_issue（创建议题）、list_pull_requests（列出拉取请求）、merge_pull_request（合并拉取请求）、get_repository（获取仓库）、search_code（搜索代码）等等。你将智能体指向它，这是一种非常棒的入门体验。但对于数量惊人的实际工作负载而言，这其实是错误的形态。

The thesis is short: MCP design usually wraps each service as a pile of dedicated tools; a CLI hands the agent one really flexible tool. With today’s models, the flexible tool wins. 核心论点很简单：MCP 设计通常将每个服务封装成一堆专用工具；而 CLI（命令行界面）则交给智能体一个极其灵活的工具。在当今的模型能力下，灵活的工具胜出。

Comparison of MCP vs CLI approaches. The two shapes ask the model to do different work. With a pile of dedicated tools, the agent just has to pick the right one off a menu. With a flexible tool, it has to figure out how to put the pieces together itself. That second part used to be the hard one. Models would hallucinate flags, lose the thread on long pipelines, misread help text, so wrapping every operation in a pre-baked tool was a sensible defense. MCP 与 CLI 方法的对比。这两种形态要求模型完成不同的工作。面对一堆专用工具，智能体只需从菜单中挑选正确的那个；而面对一个灵活的工具，它必须自己弄清楚如何将各个部分组合在一起。后一部分曾经是难点。过去，模型会产生幻觉（如虚构参数）、在长管道任务中丢失逻辑、误读帮助文档，因此将每个操作封装在预设的工具中是一种合理的防御手段。

That just isn’t true anymore. Today’s models read a --help page or SKILL.md when they need to, know the canonical CLIs from training, string together bash without supervision, and retry when they get a flag wrong. The hard part got easy, the easy part was always easy, and all those neatly-wrapped tools mostly just bloat the model’s context for nothing now. 但现在情况已不再如此。今天的模型可以在需要时阅读 --help 页面或 SKILL.md，通过训练掌握规范的 CLI 用法，无需监督即可串联 bash 命令，并在参数错误时自动重试。难的部分变简单了，简单部分一直都很简单，而那些封装精美的工具现在大多只会无谓地占用模型的上下文空间。

Of course it’s not all roses and sunshine. Handing the agent a terminal also hands it a much bigger blast radius. The same flexibility that lets it compose gh | jq | xargs into something useful also lets a prompt injection talk it into something a lot worse than a hostile Cypher query. So yes, there’s a trade-off, and you have to actually think about it (sandbox, allowlist, separate OS user, read-only role at the database, the usual stuff). But when you can give the agent a terminal in a reasonably safe way, the flexible side still comes out ahead. 当然，这并非全是优点。给智能体一个终端也意味着赋予了它更大的“爆炸半径”。同样的灵活性既能让它将 gh | jq | xargs 组合成有用的操作，也可能让它因提示词注入（prompt injection）而执行比恶意 Cypher 查询更糟糕的操作。所以，确实存在权衡，你必须认真考虑安全性（沙箱、白名单、独立的操作系统用户、数据库只读角色等常规手段）。但只要你能以相对安全的方式为智能体提供终端，灵活性的优势依然胜出。

Where CLI shines: The same “wrap a service as a pile of dedicated tools” pattern shows up wherever MCP does. Postgres MCPs vs. psql. Kubernetes MCPs vs. kubectl. Filesystem MCPs vs. cat, ls, mv, grep glued by pipes. Same instinct every time, same CLI counterpart every time. And the same three failure modes too, because they aren’t really about any one product. CLI 的闪光点：无论 MCP 出现在哪里，“将服务封装为一堆专用工具”的模式就会出现在哪里。Postgres MCP 对比 psql，Kubernetes MCP 对比 kubectl，文件系统 MCP 对比通过管道连接的 cat、ls、mv、grep。每次都是同样的直觉，每次都有对应的 CLI。同时也伴随着同样的三个失败模式，因为这些问题本质上并不针对任何单一产品。

Nothing in the MCP spec actually requires this approach of piling up dedicated tools. The protocol asks for typed tools, nothing more; it says nothing about how narrow each tool has to be. Implementations just gravitate toward many small narrow tools for historic reasons. You can build flexible tools that take a single expressive input the agent shapes however it wants, and most of the time you probably should. MCP 规范中并没有要求必须堆砌专用工具。该协议只要求类型化的工具，仅此而已；它并未规定每个工具必须有多“窄”。由于历史原因，实现者们倾向于创建许多细小的工具。你可以构建灵活的工具，接收智能体根据需要生成的单一表达性输入，而且在大多数情况下，你可能确实应该这样做。

To make it concrete, we’ll look at an example pitting Neo4j MCP server against Neo4j CLI. Disclaimer up front: I work at Neo4j. The choice is just convenience, but the learnings apply to most other CLIs. The Neo4j MCP server is the official server that exposes Neo4j to agents through MCP, shipping a handful of dedicated tools like read_query, write_query, and get_schema. neo4j.sh is the official command-line interface for Neo4j, a single binary you run in a terminal with credential profiles for each database you talk to. 为了具体说明，我们来看一个 Neo4j MCP 服务器与 Neo4j CLI 的对比示例。先声明：我在 Neo4j 工作。选择这个例子只是为了方便，但其中的经验适用于大多数其他 CLI。Neo4j MCP 服务器是官方服务器，通过 MCP 将 Neo4j 暴露给智能体，提供诸如 read_query、write_query 和 get_schema 等少量专用工具。neo4j.sh 是 Neo4j 的官方命令行界面，是一个单一的二进制文件，你在终端中运行它，并为每个要连接的数据库使用凭据配置文件。

To keep the comparison honest, we’ll only look at the read-query and schema pair on the MCP side against the equivalent query invocation in neo4j.sh. Same operations, same database, same Cypher going over the wire. The only thing that changes is whether the agent reaches them through a typed tool schema or through a string handed to a shell. 为了保证对比的公正性，我们仅将 MCP 端的 read-query 和 schema 与 neo4j.sh 中的等效查询调用进行比较。相同的操作、相同的数据库、相同的 Cypher 语句在网络上传输。唯一改变的是智能体是通过类型化的工具模式访问它们，还是通过传递给 shell 的字符串访问它们。

Querying across environments: We already saw how a pile of dedicated tools eats the context window with descriptions, and that some servers now ship deferred tools to push that cost off until the agent actually reaches for them. But there’s a second multiplier nobody talks about: what happens when you want to talk to more than one instance of the same service. With MCP, the tool count doesn’t just grow with features, it grows with environments. 跨环境查询：我们已经看到一堆专用工具是如何通过描述占用上下文窗口的，并且一些服务器现在开始提供“延迟加载工具”来推迟这种成本，直到智能体真正需要它们时才加载。但还有一个没人谈论的“乘数效应”：当你想要与同一服务的多个实例交互时会发生什么？使用 MCP，工具数量不仅随功能增长，还会随环境增长。

The agent wants a node count from dev, staging, and prod. Through MCP, you stand up a neo4j-mcp-server per environment, each one carrying its four tool schemas into the agent’s context on every turn. Three databases is twelve schemas in the model’s window, the same four schemas three times over, before the agent has done anything. 智能体想要获取开发、测试和生产环境的节点计数。通过 MCP，你必须为每个环境启动一个 neo4j-mcp-server，每个服务器在每一轮对话中都会将其四个工具模式带入智能体的上下文中。三个数据库意味着模型窗口中有十二个模式——在智能体做任何事情之前，同样的四个模式就被重复了三次。

Through the CLI, it’s a for loop: $ for c in dev staging prod-ro; do neo4j-cli query -c $c --format toon "MATCH (n) RETURN count(n) AS nodes"; done One binary, three credential profiles, zero per-turn context cost. Adding a fourth environment is one more credential dbms add, not one more MCP server process. The same shape carries over to any “reach out to N similar things” workflow you might want: snapshotting prod before a risky deploy, diffing the schema between staging and prod, running a health check across every database the agent knows about. 通过 CLI，这只是一个 for 循环： $ for c in dev staging prod-ro; do neo4j-cli query -c $c --format toon "MATCH (n) RETURN count(n) AS nodes"; done 一个二进制文件，三个凭据配置文件，每轮对话的上下文成本为零。增加第四个环境只需添加一个凭据，而不是增加一个 MCP 服务器进程。同样的模式适用于任何你想要的“访问 N 个相似事物”的工作流：在进行高风险部署前对生产环境进行快照、对比测试与生产环境的模式差异、对智能体已知的所有数据库运行健康检查。

Chaining queries: Say the agent is investigating a known fraud account: from a single seed, find every account it transacted with, then find which other accounts those counterparties transact with the most often. Two queries against the same database, where the second’s parameters are the output of the first. 查询链式调用：假设智能体正在调查一个已知的欺诈账户：从一个种子账户开始，找出它交易过的所有账户，然后找出这些交易对手最常与哪些其他账户进行交易。这是针对同一个数据库的两次查询，其中第二次查询的参数是第一次查询的输出。

Through MCP, the model has to be the pipe. It calls read-cypher, the result comes back as a list of, say, 80 counterparty IDs, those 80 IDs sit in the model’s context now, the model formats them into the parameter for the second read-cypher call, and only then can query two run. The intermediate list rides the conversation verbatim, and every extra ID is another row of context the agent pays for whether it ever reads it again or not. 通过 MCP，模型必须充当“管道”。它调用 read-cypher，结果返回一个包含（比如）80 个交易对手 ID 的列表，这 80 个 ID 现在占据了模型的上下文，模型将它们格式化为第二次 read-cypher 调用的参数，然后第二次查询才能运行。中间列表逐字记录在对话中，每一个额外的 ID 都是模型需要支付的上下文成本，无论它之后是否还会读取这些数据。