Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis

多模型 AI 系统中的涌现式协作审议：一种用于认知综合的拜占庭容错（BFT）衍生协议

Abstract: We present the Consilium Protocol, a Byzantine Fault Tolerance-derived architecture for structured multi-model AI deliberation that treats inter-model disagreement as epistemic signal rather than error. The protocol assigns engineered cognitive personas to language models — separating what a model is from how it reasons — and introduces an In-Sample/Out-of-Sample validation framework adapted from quantitative finance to distinguish training-data consensus from empirically grounded conclusions.

摘要： 我们提出了 Consilium 协议，这是一种源自拜占庭容错（BFT）的架构，用于结构化的多模型 AI 审议，它将模型间的分歧视为认知信号而非错误。该协议为语言模型分配了工程化的认知角色（Cognitive Personas），将“模型本身是什么”与“模型如何推理”分离开来；并引入了从量化金融领域改编的“样本内/样本外”（In-Sample/Out-of-Sample）验证框架，以区分训练数据共识与基于经验的结论。

Across 1,478 deliberation sessions spanning 32 topics in 10 domain categories, we demonstrate that (1) the cognitive persona, not the underlying model, determines epistemic behavior: free edge-inference models costing 0.0002 USD per batch produced comparable analytical output to frontier models costing 10.69 USD; (2) RLHF alignment training creates measurable, domain-specific epistemic blind spots — contested policy topics exhibit 12.3 percentage points less adversarial challenge than settled science topics, and AI safety topics show asymmetric bias ($\Delta$=11.6%) where models challenge claims that AI is dangerous far more vigorously than claims that AI risk is overstated; (3) the protocol exhibits no directional bias of its own (immigration $\Delta$=2.3%, renewables $\Delta$=1.2%); and (4) out-of-sample evidence retrieval validated 239 claims with 100% evidence retrieval and surfaced 167 blind-spot discoveries invisible to training-data deliberation.

在涵盖 10 个领域类别、32 个主题的 1,478 场审议会议中，我们证明了：(1) 认知角色而非底层模型决定了认知行为：每批次成本仅 0.0002 美元的免费边缘推理模型，其分析输出与成本 10.69 美元的前沿模型相当；(2) RLHF 对齐训练产生了可衡量的、特定领域的认知盲点——在有争议的政策主题中，对抗性挑战比已定论的科学主题少 12.3 个百分点；在 AI 安全主题中，模型表现出不对称偏见（$\Delta$=11.6%），即模型对“AI 是危险的”这一主张的质疑力度，远高于对“AI 风险被夸大”这一主张的质疑；(3) 该协议本身不表现出方向性偏见（移民议题 $\Delta$=2.3%，可再生能源议题 $\Delta$=1.2%）；(4) 样本外证据检索验证了 239 项主张，证据检索率达 100%，并发现了 167 个在训练数据审议中不可见的盲点。

Run-to-run reproducibility across randomized model$\times$persona assignments averages $\pm$2.2% standard deviation. Total cost for the complete battery including all overhead: 217 USD. We release the protocol specification under MIT license to enable independent verification.

在随机分配模型与角色的情况下，运行间的可重复性平均标准差为 $\pm$2.2%。包含所有开销在内的完整测试总成本为 217 美元。我们以 MIT 许可证发布了协议规范，以支持独立验证。