Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

共识在策略上是不充分的：推理轨迹分歧作为一种知识表示信号

Abstract: Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error.

摘要： 多智能体系统通常被设计为通过投票、共识协议、辩论或容错聚合来减少分歧。我们认为，对于涉及价值判断的任务，这一目标是不充分的，因为在这些任务中，分歧可能反映了真正的规范性不确定性，而非智能体本身的错误。

Building on prior work on reasoning-trace disagreement in human-AI collaborative moderation, we propose a knowledge-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states.

基于先前关于人机协作审核中推理轨迹分歧的研究，我们提出了一个知识表示层，将推理轨迹和智能体决策抽象为符号化的分歧状态。

Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement. These states support defeasible strategic routing rules.

鉴于智能体能够生成明确的推理轨迹和二元决策，我们根据推理的相似性和结论的一致性区分了四种状态：收敛一致、发散一致、收敛分歧和发散分歧。这些状态支持可撤销的策略路由规则。

We instantiate the framework in content moderation and argue that disagreement-aware routing provides a bridge between sub-symbolic LLM deliberation and symbolic knowledge representation for multi-agent strategic reasoning.

我们将该框架应用于内容审核，并论证了“分歧感知路由”（disagreement-aware routing）为多智能体策略推理中，亚符号大语言模型（LLM）的审议与符号化知识表示之间架起了一座桥梁。