CHAL: Council of Hierarchical Agentic Language

Abstract: Multi-agent debate has emerged as a promising approach for improving LLM reasoning on ground-truth tasks, yet current methodologies face certain structural limitations: debate tends to induce a martingale over belief trajectories, majority voting accounts for most observed gains, and LLMs exhibit confidence escalation rather than calibration across rounds.

摘要： 多智能体辩论已成为提高大语言模型（LLM）在基础事实任务中推理能力的一种有前景的方法，但当前的方法论面临着某些结构性局限：辩论往往会在信念轨迹上产生鞅过程（martingale），多数投票机制解释了大部分观察到的增益，且大语言模型在多轮辩论中表现出的是置信度升级而非校准。

We argue that the genuine value of debate, and dialectic systems as a whole, lies not in ground-truth tasks but in defeasible domains, where every position can in principle be defeated by better reasoning.

我们认为，辩论以及辩证系统整体的真正价值不在于基础事实任务，而在于可废止领域（defeasible domains），在这些领域中，任何立场原则上都可以被更好的推理所推翻。

We present the Council of Hierarchical Agentic Language (CHAL), a multi-agent dialectic framework that treats defeasible argumentation as an engine for belief optimization. Each agent maintains a CHAL Belief Schema (CBS), a graph-structured belief representation with a Bayesian-inspired architecture, that facilitates belief revision through a gradient-informed dynamic mechanism by leveraging the strength of the belief’s thesis as a differentiable objective.

我们提出了“分层智能体语言委员会”（CHAL），这是一个多智能体辩证框架，将可废止论证视为信念优化的引擎。每个智能体都维护一个 CHAL 信念模式（CBS），这是一种具有贝叶斯启发架构的图结构信念表示，通过利用信念论点的强度作为可微目标，借助梯度引导的动态机制促进信念修正。

Meta-cognitive value systems spanning epistemology, logic, and ethics are elevated to configurable hyperparameters governing agent reasoning and adjudication outcomes.

涵盖认识论、逻辑学和伦理学的元认知价值体系被提升为可配置的超参数，用以管理智能体的推理和裁决结果。

We provide a series of ablation experiments that demonstrate systematic and interpretable effects: the adjudicator’s value system determines the debate’s overall trajectories in latent belief space, council diversity refines beliefs for all participants, and the framework generalizes across broad fields.

我们提供了一系列消融实验，证明了系统且可解释的效果：裁决者的价值体系决定了辩论在潜在信念空间中的整体轨迹，委员会的多样性优化了所有参与者的信念，且该框架在广泛的领域中具有通用性。

CHAL is, to our knowledge, the first framework to treat multi-agent debate as structured belief optimization over defeasible domains. Further, the auditable belief artifacts it produces establish the foundation for dedicated evaluation suites for defeasible argumentation, with broader implications for building AI systems whose reasoning and value commitments are transparent, aligned, and subject to human oversight.

据我们所知，CHAL 是第一个将多智能体辩论视为可废止领域中结构化信念优化的框架。此外，它所产生的可审计信念产物，为可废止论证的专门评估套件奠定了基础，对于构建推理和价值承诺透明、对齐且受人类监督的 AI 系统具有更广泛的意义。