BALAR : A Bayesian Agentic Loop for Active Reasoning

BALAR: A Bayesian Agentic Loop for Active Reasoning

Abstract: Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next.

摘要： 大型语言模型正越来越多地应用于交互式场景中，在这些场景下，完成任务需要与用户进行多轮信息交换。然而，目前大多数系统对对话的处理较为被动，缺乏一种原则性的机制来推断缺失哪些信息以及下一步应该提出什么问题。

We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient.

我们提出了 BALAR（主动推理的贝叶斯代理循环），这是一种与任务无关的外循环算法。它无需微调，能够实现大语言模型（LLM）代理与用户之间的结构化多轮交互。BALAR 维护着关于潜在状态的结构化信念，通过最大化预期互信息来选择澄清性问题，并在当前状态表示不足时动态扩展其状态表示。

We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with 14.6% higher accuracy on AR-Bench-DC, 38.5% on AR-Bench-SP, and 30.5% on iCraft-MD.

我们在三个不同的基准测试上评估了 BALAR：AR-Bench-DC（侦探案件）、AR-Bench-SP（思维谜题）和 iCraft-MD（临床诊断）。BALAR 在所有三个基准测试中均显著优于所有基线模型，其中在 AR-Bench-DC 上的准确率提高了 14.6%，在 AR-Bench-SP 上提高了 38.5%，在 iCraft-MD 上提高了 30.5%。

Paper Details:

Authors: Aymen Echarghaoui, Dongxia Wu, Emily B. Fox
Submission Date: 6 May 2026
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

论文详情：

作者： Aymen Echarghaoui, Dongxia Wu, Emily B. Fox
提交日期： 2026 年 5 月 6 日
学科分类： 人工智能 (cs.AI)；计算与语言 (cs.CL)；机器学习 (cs.LG)