PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts

PolitNuggets：长尾政治事实智能体发现能力的基准测试

Abstract: Large Reasoning Models (LRMs) embedded in agentic frameworks have transformed information retrieval from static, long-context question answering into open-ended exploration. Yet real-world use requires models to discover and synthesize “long-tail” facts from dispersed sources, a capability that remains under-evaluated.

摘要： 嵌入智能体框架的大型推理模型（LRMs）已将信息检索从静态的长上下文问答转变为开放式的探索。然而，现实世界的应用要求模型能够从分散的来源中发现并综合“长尾”事实，而这一能力目前尚未得到充分评估。

We introduce PolitNuggets, a multilingual benchmark for agentic information synthesis via constructing political biographies for 400 global elites, covering over 10,000 political facts. We standardize evaluation with an optimized multi-agent system and propose FactNet, an evidence-conditional protocol that scores discovery, fine-grained accuracy, and efficiency.

我们推出了 PolitNuggets，这是一个用于智能体信息综合的多语言基准测试，通过为 400 位全球精英构建政治传记，涵盖了超过 10,000 条政治事实。我们利用优化的多智能体系统实现了评估标准化，并提出了 FactNet——一种基于证据条件的协议，用于对发现能力、细粒度准确性和效率进行评分。

Across models and settings, we find that current systems often struggle with fine-grained details, and vary substantially in efficiency. Finally, using benchmark diagnostics, we relate agent performance to underlying model capabilities, highlighting the importance of short-context extraction, multilingual robustness, and reliable tool use.

在各种模型和设置中，我们发现当前系统在处理细粒度细节时往往表现吃力，且在效率上存在显著差异。最后，通过基准诊断，我们将智能体性能与底层模型能力联系起来，强调了短上下文提取、多语言稳健性以及可靠工具使用的重要性。

Paper Details:

Authors: Yifei Zhu
Date: 13 May 2026
Subject: Artificial Intelligence (cs.AI)
DOI: 10.48550/arXiv.2605.14002

论文详情：

作者： Yifei Zhu
日期： 2026年5月13日
学科： 人工智能 (cs.AI)
DOI： 10.48550/arXiv.2605.14002