What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

智能体该说些什么？用于高效多智能体系统的动作-状态通信

Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost.

基于大语言模型构建的多智能体系统（MAS）通常围绕角色、流水线和轮次调度进行组织，而智能体之间传递的内容往往是无约束的自然语言。然而，这种自由形式的通信会迅速增加 Token 的使用量，消耗共享的上下文窗口，并最终影响系统的性能和推理成本。

We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents.

我们分析了两种 MAS 拓扑结构下的五种常见智能体间通信策略，发现没有一种固定的策略是普遍最优的。相反，有效的智能体间消息始终保留了下游智能体所需的以动作为中心的信息。

Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history.

基于此，我们提出了 PACT（协议化动作-状态通信与传输），它将智能体间的通信视为一个公共状态更新问题，并在原始智能体输出进入共享历史记录之前，将其投影为紧凑的动作-状态记录。

Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands’ resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at this https URL.

在不同的 MAS 拓扑结构中，PACT 持续改善了性能与成本之间的权衡，在大幅减少 Token 使用的同时，实现了相当或更强的任务性能。这些收益也延伸到了生产级编码框架中：PACT 在 OpenHands 上提升了解决率，且每个解决任务的 Token 消耗减少了 10%；在 SWE-agent 上，它在保持解决率不变的同时，将输入 Token 减少了一半。我们的代码已在链接中公开。