SANA: What Matters for QA Agents over Massive Data Lakes?

SANA：在大规模数据湖上，问答智能体（QA Agents）的关键要素是什么？

Exploratory question answering (EQA) over data lakes requires an LLM agent to discover relevant sources, analyze retrieved data, and adapt its actions based on intermediate results. 在大规模数据湖上进行探索性问答（EQA）时，需要大模型（LLM）智能体能够发现相关数据源、分析检索到的数据，并根据中间结果调整其行动。

End-to-end accuracy alone cannot distinguish failures in search, planning, data analysis, or the agent’s Action Policy: its decisions about what to do next and when to submit an answer. 仅凭端到端的准确率无法区分智能体在搜索、规划、数据分析或“行动策略”（即决定下一步做什么以及何时提交答案的决策）方面的具体失败原因。

We present SANA (Search Agent Navigation Ablation framework), a diagnostic ablation framework that transforms EQA tasks into runtime profiles containing gold source sequence, sanitized subquestions, and execution records. 我们提出了 SANA（搜索智能体导航消融框架），这是一个诊断性消融框架，它将 EQA 任务转化为包含黄金源序列、清洗后的子问题以及执行记录的运行时配置文件。

SANA uses these profiles to construct idealized search, planning, and data-analysis tools, allowing each component to be ablated; the residual gap is diagnostic evidence for policy failures. SANA 利用这些配置文件构建理想化的搜索、规划和数据分析工具，从而允许对每个组件进行消融实验；由此产生的性能差距即为诊断策略失败的证据。

To illustrate SANA as a reusable evaluation framework, we adapted two recent EQA benchmarks, LakeQA and KramaBench, and evaluated lightweight and mid-sized agents under fixed prompts, budgets, data lakes, and runtimes. 为了展示 SANA 作为可复用评估框架的价值，我们适配了两个近期的 EQA 基准测试——LakeQA 和 KramaBench，并在固定的提示词、预算、数据湖和运行时环境下，对轻量级和中等规模的智能体进行了评估。

Across both benchmarks, data analysis is a consistent bottleneck while planning is less so. Search is a major limitation in LakeQA’s large data-lake setting, but less so for the smaller-scale KramaBench. 在两个基准测试中，数据分析始终是性能瓶颈，而规划环节的影响相对较小。搜索在 LakeQA 的大规模数据湖场景中是一个主要限制因素，但在规模较小的 KramaBench 中则表现较好。

SANA thus deconstructs end-to-end task accuracies into a diagnosis of where data-lake agents fail, and allows for systematic comparisons of progress in search, planning, data analysis, and agent design. 因此，SANA 将端到端的任务准确率拆解为对数据湖智能体失败原因的诊断，并支持对搜索、规划、数据分析及智能体设计方面的进展进行系统性比较。