Hidden Anchors in Multi-Agent LLM Deliberation

多智能体大模型审议中的“隐藏锚点”

Abstract: Multi-agent LLM deliberation, where agents exchange and revise answers over several rounds, is increasingly used to improve reasoning and accuracy, yet how and why it works is rarely modelled. Such deliberation mirrors how humans reach decisions. As social animals we are pulled both by the group, the herd effect that classical opinion-dynamics models such as DeGroot and Friedkin—Johnsen capture, and by our own internal belief, which they do not.

摘要： 多智能体大模型（LLM）审议是指智能体通过多轮交流和修正答案来提升推理能力和准确性的过程，这种方法正被日益广泛地应用，但其运作机制和原理却鲜有建模研究。这种审议过程反映了人类做出决策的方式。作为社会性动物，我们既受到群体的影响——即经典意见动力学模型（如 DeGroot 和 Friedkin-Johnsen 模型）所捕捉到的“羊群效应”，也受到自身内在信念的驱动，而后者在这些经典模型中并未得到体现。

We model multi-agent deliberation as a closed-loop dynamical system in which each agent carries a hidden internal belief, its anchor, that continually pulls its opinion regardless of its neighbours. We show this anchor can be recovered from the deliberation alone, and that it explains a behaviour classical consensus rules forbid: an agent’s confidence in the correct answer can climb past where any agent started, escaping the space (convexhull) formed by the initial beliefs.

我们将多智能体审议建模为一个闭环动力系统，其中每个智能体都持有一种隐藏的内在信念，即“锚点”（anchor）。无论邻居的意见如何，该锚点都会持续拉动智能体的观点。我们证明，仅通过审议过程本身即可恢复出这一锚点，并且它解释了经典共识规则所禁止的一种行为：智能体对正确答案的信心可以超过任何智能体初始的信心水平，从而跳出由初始信念所构成的空间（凸包）。

Checking whether the recovered anchor also predicts held-out runs (generalizes) gives a simple test for when a model is truly driven by such an anchor. Across three open-weight model families this is a spectrum, not all-or-nothing. All anchors’ influence are about equally strongly, but they differ in where the anchor sits, and only when it sits far from the initial opinions does deliberation escape the hull and need the full closed-loop model.

通过检查恢复出的锚点是否能预测未参与训练的运行结果（即泛化能力），我们可以简单地测试模型是否真正受到此类锚点的驱动。在三个开源权重模型家族的测试中，这呈现为一个连续谱系，而非非黑即白的二元状态。所有锚点的影响力大致相当，但它们的具体位置有所不同；只有当锚点远离初始观点时，审议过程才会跳出初始信念空间，此时才需要使用完整的闭环模型进行分析。