Enterprise AI Agents Are Leaving the Server | Focused Labs

Enterprise AI Agents Are Leaving the Server | Focused Labs

企业 AI 智能体正在走出服务器 | Focused Labs

Enterprise AI agents are leaving the server boundary. A boundary that looks deceptively small until the agent starts acting on behalf of a person inside a browser tab, a desktop application, a row on a grid, a locally saved draft, a clipboard, a device permission, an approval flow, and the rest of the mess. That person’s work does not always translate into a server-side record, so server-only agent tools are insufficient as the primary integration model.

企业 AI 智能体正在突破服务器的边界。这个边界看起来微不足道,但一旦智能体开始代表用户在浏览器标签页、桌面应用、表格行、本地草稿、剪贴板、设备权限、审批流程以及其他各种复杂场景中执行操作时,其局限性便显露无疑。用户的操作并不总是能转化为服务器端的记录,因此仅依赖服务器端的智能体工具作为主要的集成模型是远远不够的。

Backend tools cannot see the product moment. A server tool can update an account, search a knowledge base, create a ticket, or call an ERP workflow. This is the “record after” the product has turned intent into a stored fact. The product moment arrives earlier. A user selects three bullets from a proposed set of actions in a workflow. A sales engineer is editing the pricing for a set of products and has made unsaved changes to the discount for each. A support rep is viewing an incident timeline of incidents for a set of customers. A product manager has selected a cohort of customers for analysis.

后端工具无法洞察“产品时刻”(product moment)。服务器工具可以更新账户、搜索知识库、创建工单或调用 ERP 工作流。这属于产品将意图转化为存储事实后的“记录阶段”。而“产品时刻”发生得更早:用户在工作流中从建议的操作列表中选择了三个要点;销售工程师正在编辑一组产品的定价,并对每个产品的折扣进行了未保存的修改;支持代表正在查看一组客户的事件时间线;产品经理选择了一组客户进行分析。

The client knows where the cursor is, what the user has selected, the scroll position in the product, the current route the user is on, the unsaved form data for the current step in the workflow, the dimensions of the current viewport, the current browser permission state, and the last UI action that the user performed. The server knows nothing, or it knows a stale object model for a set of records. That gap is why LangChain’s architecture for headless tools is so important. To the model, the tool is just another normal tool with a name, description, schema for the parameters, and result. The significant aspect of this is that the tool is being executed on the client.

客户端知道光标的位置、用户的选择、产品中的滚动位置、用户当前所在的路由、工作流当前步骤中未保存的表单数据、当前视口的尺寸、当前的浏览器权限状态以及用户执行的最后一个 UI 操作。而服务器对此一无所知,或者仅掌握一组记录的陈旧对象模型。这种差距正是 LangChain 的无头工具(headless tools)架构如此重要的原因。对于模型而言,该工具只是另一个带有名称、描述、参数模式和结果的普通工具。其关键之处在于,该工具是在客户端执行的。

This also shifts the focus of integration in the enterprise significantly. As we wrote about CRM integration moving into the agent runtime, identity, approval, retry, idempotency and tracing decide whether the integration is safe. And as we laid out this week, that same model is now crossing over into the browser as well. The backend runtime is still the place to put enterprise integration for that backend service. But the selected object in Figma, unsaved field in a CRM modal, or even more simply, the browser permission prompt are all now in the agent’s execution path. The client runtime becomes part of the execution surface.

这也显著改变了企业集成的重心。正如我们之前撰写的关于 CRM 集成进入智能体运行时的文章所述,身份验证、审批、重试、幂等性和追踪决定了集成的安全性。正如我们本周所阐述的,同样的模型现在也正在跨越到浏览器中。后端运行时仍然是放置后端服务企业集成的地方。但是,Figma 中选中的对象、CRM 模态框中未保存的字段,或者更简单地说,浏览器权限提示,现在都处于智能体的执行路径中。客户端运行时成为了执行面的一部分。

The client runtime becomes part of the execution surface when the agent has to act on state that only exists in the application. Frontend tools are contracts, not UI glue. The lazy approach is the side channel: serialize application state, send that off to the server as a big ol’ binary blob, let the model generate a reply, then ask the app to patch the UI from the result. Sure, that works the first time. Then the shape of the serialized data changes in a way that is not obvious even to the author of the code, the model starts operating off stale context, and nobody knows whether the current UI came from a user action, a tool execution, or the model making a blind guess while the app team followed it.

当智能体必须对仅存在于应用程序中的状态进行操作时,客户端运行时就成为了执行面的一部分。前端工具是契约,而非 UI 胶水。一种懒惰的方法是使用侧信道:序列化应用程序状态,将其作为巨大的二进制数据块发送到服务器,让模型生成回复,然后要求应用程序根据结果修补 UI。当然,这在第一次运行时有效。但随后序列化数据的结构发生了变化,即使是代码作者也难以察觉,模型开始基于陈旧的上下文运行,没人知道当前的 UI 是来自用户操作、工具执行,还是模型在盲目猜测而应用团队盲目跟从的结果。

Frontend tools make the contract explicit. AG-UI describes tools as frontend-defined functions passed to the agent at runtime with a name, description and a JSON Schema for the parameters. The frontend implements the argument validation, invocation of the tool after the call has completed, and insertion of the tool’s result into the conversation history. Simple. The important part is the control the frontend has over the capabilities passed to the agent. For each tool, the frontend can decide whether it should be added or removed from the runtime based on user permissions, application context, and state (AG-UI tools).

前端工具使契约变得明确。AG-UI 将工具描述为在运行时传递给智能体的前端定义函数,包含名称、描述和参数的 JSON Schema。前端负责实现参数验证、调用完成后的工具执行,以及将工具结果插入到对话历史中。很简单。重要的是前端对传递给智能体的能力所拥有的控制权。对于每个工具,前端可以根据用户权限、应用程序上下文和状态(AG-UI 工具)决定是否将其添加或从运行时中移除。

A quote editor for example might decide to allow insertApprovedClause only when the record the quote is for is editable, the clause was chosen from the approved library and the user has permission to change quotes. A support console on the other hand might allow draftCustomerReply freely but require sendCustomerReply to be approved. A design tool might allow summarizeSelectedFrame without approval but require replaceSelectedFrameCopy to be approved. A client-side tool call carries validation, approval, execution, and evidence through one lifecycle.

例如,报价编辑器可能决定仅在报价记录可编辑、条款选自批准库且用户拥有更改报价权限时,才允许执行 insertApprovedClause。另一方面,支持控制台可能允许自由执行 draftCustomerReply,但要求 sendCustomerReply 必须经过审批。设计工具可能允许在无需审批的情况下执行 summarizeSelectedFrame,但要求 replaceSelectedFrameCopy 必须经过审批。客户端工具调用在一个生命周期内承载了验证、审批、执行和证据记录。

We argued earlier that agent UI is runtime infrastructure because event streams give products typed handles for tools, state, approvals, subagents, errors, and observability. Client-executed tools make that argument less theoretical. A product UI is no longer merely a shell around an agent. It owns executable capabilities the agent cannot safely fake from the server.

我们之前曾论证过,智能体 UI 是运行时基础设施,因为事件流为产品提供了针对工具、状态、审批、子智能体、错误和可观测性的类型化句柄。客户端执行工具使这一论点不再仅仅是理论。产品 UI 不再仅仅是智能体的外壳,它拥有智能体无法从服务器端安全伪造的可执行能力。

AG-UI is the protocol layer showing up on schedule. MCP provides a standard interface to Tools and Data for Agents, A2A provides a standard interface for Agents to interact with other Agents. AG-UI is targeting the Agent-to-user-facing-application interface. In this space, events (programmed or human triggered) and the streaming of updates to the UI, as well as, multi-modal input (e.g., speech and ink), shared state, frontend tool calls, and human-in-the-loop interrupts all need to be dealt with by the UI. This is the scope of the functionality currently defined by AG-UI.

AG-UI 是按计划出现的协议层。MCP 为智能体提供了访问工具和数据的标准接口,A2A 为智能体与其他智能体的交互提供了标准接口。AG-UI 则瞄准了“智能体到用户界面应用”的接口。在这个领域,事件(程序触发或人工触发)、UI 更新流,以及多模态输入(如语音和手写)、共享状态、前端工具调用和人机交互中断,都需要由 UI 来处理。这就是 AG-UI 目前定义的功能范围。

There’s a clear boundary in the system at the point where the user-facing application can determine the facts of runtime: who is currently present; what has the user selected; what has changed locally on the user’s workstation that will affect the tool results; what can be undone on the user’s workstation; and what, on the user’s workstation, requires a human click before a particular set of side effects can occur on the server. The agent-operable interface is the product once the tool moves from brochureware integration within the product to production action within the product.

系统存在一个清晰的边界,即用户界面应用能够确定运行时事实的时刻:当前有哪些人;用户选择了什么;用户工作站上发生了哪些会影响工具结果的本地变更;用户工作站上可以撤销什么;以及在服务器端发生特定副作用之前,用户工作站上需要进行哪些人工点击。当工具从产品内的“宣传式集成”转变为“生产级操作”时,智能体可操作接口就成为了产品本身。

Microsoft’s Agent Framework AG-UI integration points in the same direction. Its documentation lists real-time streaming, session and thread management, state synchronization and sharing, human-in-the…

微软的智能体框架 AG-UI 集成也指向了相同的方向。其文档列出了实时流、会话和线程管理、状态同步与共享、人机交互……