PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts
PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts
PolitNuggets:长尾政治事实智能体发现能力的基准测试
Abstract: Large Reasoning Models (LRMs) embedded in agentic frameworks have transformed information retrieval from static, long-context question answering into open-ended exploration. Yet real-world use requires models to discover and synthesize “long-tail” facts from dispersed sources, a capability that remains under-evaluated.
摘要: 嵌入智能体框架的大型推理模型(LRMs)已将信息检索从静态的长上下文问答转变为开放式的探索。然而,现实世界的应用要求模型能够从分散的来源中发现并综合“长尾”事实,而这一能力目前尚未得到充分评估。
We introduce PolitNuggets, a multilingual benchmark for agentic information synthesis via constructing political biographies for 400 global elites, covering over 10,000 political facts. We standardize evaluation with an optimized multi-agent system and propose FactNet, an evidence-conditional protocol that scores discovery, fine-grained accuracy, and efficiency.
我们推出了 PolitNuggets,这是一个用于智能体信息综合的多语言基准测试,通过为 400 位全球精英构建政治传记,涵盖了超过 10,000 条政治事实。我们利用优化的多智能体系统实现了评估标准化,并提出了 FactNet——一种基于证据条件的协议,用于对发现能力、细粒度准确性和效率进行评分。
Across models and settings, we find that current systems often struggle with fine-grained details, and vary substantially in efficiency. Finally, using benchmark diagnostics, we relate agent performance to underlying model capabilities, highlighting the importance of short-context extraction, multilingual robustness, and reliable tool use.
在各种模型和设置中,我们发现当前系统在处理细粒度细节时往往表现吃力,且在效率上存在显著差异。最后,通过基准诊断,我们将智能体性能与底层模型能力联系起来,强调了短上下文提取、多语言稳健性以及可靠工具使用的重要性。
Paper Details:
- Authors: Yifei Zhu
- Date: 13 May 2026
- Subject: Artificial Intelligence (cs.AI)
- DOI: 10.48550/arXiv.2605.14002
论文详情:
- 作者: Yifei Zhu
- 日期: 2026年5月13日
- 学科: 人工智能 (cs.AI)
- DOI: 10.48550/arXiv.2605.14002