Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

混合开放式三元进化：打造更强的深度研究智能体

Abstract: Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks.

摘要： 深度研究与智能体进化是人工智能迈向通用人工智能（AGI）过程中，在现实应用中的核心任务。前者使智能体能够在开放环境中自主检索并整合信息，以应对开放式研究任务，但受限于智能体系统静态参数化的深度研究能力。后者则允许智能体通过与环境的自主交互获取经验，从而进化模型能力。然而，目前其有效性仅在具有标准答案的可验证任务中得到了广泛验证，在开放式研究任务方面仍存在空白。

To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments.

为了弥合这两项关键任务之间的鸿沟，我们提出了“混合开放式三元进化”（Hybrid Open-Ended Tri-Evolution, HOTE）框架。该框架利用混合模式强化学习，基于网络规模的知识，促进提议者（proposer）、求解者（solver）和评估者（judge）的协同进化，旨在推动智能体在开放式任务和环境中实现自主进化。

Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.

在三个长文本深度研究基准测试上的广泛实验表明，通过 HOTE 训练的 8B 模型不仅超越了目前最强的静态开源 8-32B 模型，也优于采用现有最先进深度研究训练方法训练的模型，且耗时更少。实验进一步验证了 HOTE 中所有三个模块的协同进化是不可或缺的。