When Does Personality Composition Matter for Multi-Agent LLM Teams?

多智能体大模型团队中，人格构成何时重要？

Abstract: Personality prompting shapes how large language models communicate, yet whether these behavioral shifts affect objective task outcomes remains under-explored. Prior work shows that agents prompted with low agreeableness produce adversarial language, while those prompted with high agreeableness become cooperative, but the relationship between communication style and task performance has not been systematically examined across multiple domains.

摘要： 人格提示（Personality prompting）塑造了大语言模型的沟通方式，然而这些行为转变是否会影响客观的任务结果，目前仍缺乏深入研究。先前的研究表明，被提示为“低宜人性”（low agreeableness）的智能体会产生对抗性语言，而“高宜人性”的智能体会变得更具合作性；但沟通风格与任务表现之间的关系尚未在多个领域中得到系统性的检验。

In this work, we investigate whether personality composition matters for multi-agent team performance by manipulating personality traits across frontier LLMs on three task domains: structured coding, open-ended research collaboration, and competitive bargaining. We find that personality effects depend critically on task structure.

在这项工作中，我们通过在三个任务领域（结构化编程、开放式研究协作以及竞争性谈判）中操纵前沿大模型的人格特质，研究了人格构成对多智能体团队表现的影响。我们发现，人格效应在很大程度上取决于任务结构。

In coding tasks, low agreeableness leads to large communication shifts that have little effect on milestone completion. In open-ended collaboration and bargaining, the same manipulation substantially degrades performance. We discuss implications for multi-agent system design and the limits of personality manipulation.

在编程任务中，低宜人性会导致沟通方式发生巨大转变，但对里程碑任务的完成几乎没有影响。而在开放式协作和谈判任务中，同样的操纵则会显著降低表现。我们讨论了这些发现对多智能体系统设计的影响，以及人格操纵的局限性。