Causal Foundations of Collective Agency

集体代理的因果基础

Abstract: A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. 摘要： 先进人工智能系统安全面临的一个关键挑战是，多个简单的智能体可能会无意中形成一个集体智能体，其能力和目标与任何个体智能体截然不同。

More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. 更广泛地说，确定何时可以将一组智能体视为一个统一的集体智能体，是研究生物系统和人工系统中的交互与激励机制的一个基础性问题。

We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group’s joint actions as rational and goal-directed successfully predicts its behavior. 我们采用行为视角来回答这个问题：当我们将一个群体的联合行动视为理性的、目标导向的行为，并能成功预测其行为时，我们就将集体代理权归属于该群体。

We formalize this perspective on collective agency using causal games — which are causal models of strategic, multi-agent interactions — and causal abstraction — which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. 我们利用“因果博弈”（causal games，即战略性多智能体交互的因果模型）和“因果抽象”（causal abstraction，即形式化定义简单的高层模型何时能忠实地捕捉复杂的底层模型）来对这一集体代理视角进行形式化。

We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. 我们利用该框架解决了一个关于“演员-评论家”（actor-critic）模型中多智能体激励的难题，并对不同投票机制所表现出的集体代理程度进行了定量评估。

Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems. 我们的框架旨在为理解、预测和控制多智能体人工智能系统中的涌现性集体智能体提供理论和实证工作的基础。