Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

超越并行采样：智能体搜索中的多样化查询初始化

Abstract: Test-time scaling for agentic search typically increases depth (i.e., more turns and tokens per trajectory) or breadth (i.e., more parallel rollouts). Here we focus on breadth scaling, showing that standard parallel sampling yields diminishing returns, tracing this to query redundancy at the first turn. When models issue similar first queries across rollouts, the threads retrieve overlapping evidence, and subsequent turns are conditioned on this shared retrieval.

摘要： 智能体搜索（Agentic Search）的测试时扩展（Test-time scaling）通常通过增加深度（即每个轨迹更多的轮次和 Token）或广度（即更多的并行执行）来实现。本文重点研究广度扩展，指出标准的并行采样会产生边际效应递减，并将其归因于第一轮查询的冗余性。当模型在不同的执行路径中发出相似的初始查询时，各线程检索到的证据会出现重叠，导致后续轮次的推理都基于这些共享的检索结果。

We address this limitation with DivInit, a training-free intervention at the first turn. Rather than sampling k independent first queries, DivInit draws n candidates from a single call, picks k < n diverse seeds, and runs them as parallel trajectories. Across five open-weight models and eight benchmarks, DivInit consistently improves over standard parallel sampling, with average gains of five to seven points on multi-hop QA at matched compute.

我们通过 DivInit 解决了这一局限性，这是一种在第一轮进行的无需训练的干预方法。DivInit 不再采样 k 个独立的初始查询，而是通过单次调用提取 n 个候选查询，从中挑选出 k < n 个多样化的种子，并将它们作为并行轨迹运行。在五个开源权重模型和八个基准测试中，DivInit 的表现始终优于标准并行采样，在计算资源相同的情况下，多跳问答（Multi-hop QA）任务的平均得分提升了 5 到 7 个百分点。

Paper Details:

Authors: Sidhaarth Murali, João Coelho, Jingjie Ning, João Magalhães, Bruno Martins, Chenyan Xiong
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
arXiv ID: 2606.17209

论文详情：

作者： Sidhaarth Murali, João Coelho, Jingjie Ning, João Magalhães, Bruno Martins, Chenyan Xiong
学科分类： 人工智能 (cs.AI)；信息检索 (cs.IR)
arXiv 编号： 2606.17209