Structure-Induced Information for Rerooting Levin Tree Search

基于结构诱导信息的重根 Levin 树搜索

Abstract: Subgoal-based policy tree search, which uses a policy to guide search, is effective for complex single-agent deterministic problems but often relies on explicit subgoal generation that can incur substantial overhead and hinders scalability.

摘要： 基于子目标的策略树搜索利用策略来引导搜索，在处理复杂的单智能体确定性问题时非常有效，但它通常依赖于显式的子目标生成，这会带来巨大的开销并阻碍扩展性。

In this paper, we overcome these limitations by using a learned “rerooter” through the recently-introduced $\sqrt{\text{LTS}}$ algorithm. A rerooter implicitly decomposes the problem into soft subtasks.

在本文中，我们通过最近提出的 $\sqrt{\text{LTS}}$ 算法，利用学习到的“重根器”（rerooter）克服了这些局限性。重根器能够将问题隐式地分解为软子任务（soft subtasks）。

While previous work focused on the formal guarantees for given or handcrafted rerooters, in this work we propose three rerooter designs: (i) a clustering-based rerooter that exploits global state-space structure, (ii) a heuristic-based rerooter that leverages learned cost-to-go estimates, and (iii) a hybrid that combines both signals.

虽然以往的研究主要关注给定或手工设计的重根器的形式化保证，但在本研究中，我们提出了三种重根器设计：(i) 利用全局状态空间结构的基于聚类的重根器；(ii) 利用学习到的剩余代价估计（cost-to-go estimates）的基于启发式的重根器；以及 (iii) 结合了上述两种信号的混合重根器。

Our framework avoids having to explicitly reconstruct and reason over generated subgoals, thereby enabling scalable allocation of search effort with significantly lower computational overhead.

我们的框架避免了对生成的子目标进行显式重构和推理，从而能够以显著降低的计算开销实现搜索工作量的可扩展分配。

Empirically, our rerooting-based methods scale to complex environments where subgoal-based policy tree search fails, and achieve state-of-the-art online training efficiency on the domains tested.

实验结果表明，我们的重根方法能够扩展到基于子目标的策略树搜索失效的复杂环境，并在所测试的领域中实现了最先进的在线训练效率。