The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
双院模型:并行语言模型间的双向隐藏状态耦合
Abstract: Existing multi-model and tool-augmented systems communicate by generating text, serializing every exchange through the output vocabulary. Can two pretrained language models instead coordinate through a continuous, concurrent channel?
摘要: 现有的多模型和工具增强系统通过生成文本进行通信,将每一次交互序列化为输出词汇表。那么,两个预训练语言模型是否可以通过一个连续的、并发的通道进行协作呢?
The Bicameral Model couples two frozen language models through a trainable neural interface on their intermediate hidden states. At every generation step, both models run in lockstep: a primary model drives the task while an auxiliary model operates tools, solves constraints, or executes code, with both conditioning on each other’s activations through a translation network and a learned suppression gate ($\sim$1% of combined parameters).
“双院模型”(The Bicameral Model)通过一个可训练的神经接口,将其两个冻结语言模型的中间隐藏状态进行耦合。在每一个生成步骤中,两个模型同步运行:主模型驱动任务,而辅助模型负责操作工具、解决约束或执行代码;两者通过一个转换网络和一个学习到的抑制门(约占总参数的 1%)相互调节对方的激活状态。
The gate learns a selective communication protocol from task loss alone, without a prescribed format. We demonstrate the mechanism across three tool backends. On arithmetic, coupling two 0.5B models with a calculator raises accuracy from 36% to 96%.
该门控机制仅通过任务损失函数学习选择性通信协议,无需预设格式。我们在三个工具后端上验证了该机制。在算术任务中,将两个 0.5B 参数的模型与计算器耦合,准确率从 36% 提升至 96%。
On logic grid puzzles, coupling two 0.6B models with a Z3 solver achieves $1.7\times$ the unaugmented baseline on ZebraLogic. On mathematical reasoning, coupling with a Python sandbox enables the auxiliary to generate problem-specific code from hidden-state signals alone, without ever seeing the problem text.
在逻辑网格谜题中,将两个 0.6B 参数的模型与 Z3 求解器耦合,在 ZebraLogic 基准测试中达到了未增强基准的 1.7 倍性能。在数学推理任务中,通过与 Python 沙盒耦合,辅助模型仅凭隐藏状态信号即可生成针对特定问题的代码,而无需直接读取问题文本。