Can LLM Teams Play What? Where? When?

LLM 团队能玩转《什么？哪里？何时？》吗？

Abstract: Large language models (LLMs) remain limited on tasks requiring indirect reasoning, cultural knowledge, and coordinated hypothesis testing. We investigate whether team-based interaction improves LLM performance in What? Where? When? (ChGK), a quiz game designed to reward collective reasoning.

摘要： 大型语言模型（LLM）在需要间接推理、文化知识和协同假设测试的任务上仍然存在局限性。我们研究了基于团队的交互是否能提高 LLM 在《什么？哪里？何时？》（ChGK）这一旨在奖励集体推理的问答游戏中的表现。

We introduce three team strategies: Voting, Silent Team (the captain observes final answers), and Talkative Team (the captain observes both answers and rationales). To minimize data leakage, we evaluate these strategies on a dataset consisting of 572 ChGK questions released in 2025.

我们引入了三种团队策略：投票（Voting）、沉默团队（Silent Team，队长仅观察最终答案）和健谈团队（Talkative Team，队长同时观察答案和推理过程）。为了最大限度地减少数据泄露，我们在包含 572 个 2025 年发布的 ChGK 问题的数据集上评估了这些策略。

Using six recent large-scale open models, we show that team-based strategies outperform single-model baselines, yielding gains of up to 20 percentage points in accuracy. The best team achieves 44.23% accuracy, and approaches human team performance on questions with available human statistics.

通过使用六个近期的大规模开源模型，我们证明了基于团队的策略优于单模型基准，准确率提升高达 20 个百分点。表现最好的团队达到了 44.23% 的准确率，在有相关人类统计数据的问题上，其表现已接近人类团队水平。

Analysis of inter-model diversity reveals that disagreement strongly predicts lower accuracy, but explanatory communication substantially mitigates performance drops. We further examine captain behavior and find no evidence of self-preference bias; access to peer rationales improves captain judgments.

对模型间多样性的分析表明，意见分歧是准确率下降的强预测指标，但解释性的沟通能显著缓解性能下滑。我们进一步检查了队长的行为，未发现自我偏好偏差的证据；获取同伴的推理过程能够改善队长的判断。

Overall, LLM teams function primarily as answer selection and error-filtering mechanisms rather than generators of novel solutions. Our findings highlight the importance of interaction and suggest adaptive strategies as a promising direction for multi-agent systems.

总的来说，LLM 团队主要发挥答案选择和错误过滤机制的作用，而非生成创新解决方案。我们的研究结果强调了交互的重要性，并提出自适应策略是多智能体系统的一个有前景的发展方向。