Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale

利用 AI 智能体实现个性化算法黑盒审计的自动化

Abstract: Personalization algorithms determine what content users encounter on online platforms. Auditing these systems is difficult because independent auditors have only black-box access to the algorithms, while personalization depends on users’ attributes, behavior, and evolving interaction histories. Existing auditing methods face a tradeoff: studies with real users capture realistic behavior but are costly and hard to control, whereas sock-puppet audits scale more easily but often rely on scripted behavior that limits realism. Beyond this, both approaches struggle to decouple user attributes from user behavior, limiting our ability to causally understand personalization.

摘要： 个性化算法决定了用户在在线平台上看到的内容。对这些系统进行审计非常困难，因为独立审计人员只能对算法进行黑盒访问，而个性化推荐又依赖于用户的属性、行为以及不断演变的交互历史。现有的审计方法面临权衡：基于真实用户的研究虽然能捕捉到真实行为，但成本高昂且难以控制；而“傀儡账号”（sock-puppet）审计虽然更容易扩展，但往往依赖于脚本化行为，从而限制了真实性。此外，这两种方法都难以将用户属性与用户行为解耦，限制了我们对个性化推荐进行因果分析的能力。

To address this gap, we introduce a framework for black-box audits of personalization algorithms using generative AI agents as behavioral engines for synthetic accounts. Each agent is instantiated with a fixed persona, grounded in demographic and political survey data, and interacts with a platform’s content by reasoning about it and choosing actions. Because behavior is fixed within each persona while platform-visible signals such as age, gender, or location can be experimentally perturbed, our design enables counterfactual auditing of how platforms respond to user attributes.

为了弥补这一差距，我们引入了一个利用生成式 AI 智能体作为合成账号行为引擎的个性化算法黑盒审计框架。每个智能体都被赋予一个基于人口统计学和政治调查数据的固定角色（Persona），并通过对平台内容进行推理和选择操作来与平台互动。由于每个角色的行为是固定的，而年龄、性别或地理位置等平台可见信号可以进行实验性扰动，我们的设计使得对平台如何响应用户属性进行反事实审计成为可能。

As a case study, we deploy 1,120 agents on X shortly after the 2024 U.S. election, spanning 14 personas and three counterfactual conditions, collecting over 200,000 content exposures. We find that X’s algorithmic feed amplifies toxic, polarizing, political, and right-leaning content relative to the chronological feed, with amplification varying sharply by user ideology. Counterfactual analyses show that demographic signals affect content delivery in persona-dependent ways: pooled effects are largely null, while subgroup-level effects vary in direction and magnitude. Our work establishes GenAI-based agents as a new tool for algorithmic auditing.

作为案例研究，我们在 2024 年美国大选后不久在 X 平台上部署了 1,120 个智能体，涵盖 14 种角色和 3 种反事实条件，收集了超过 20 万次内容曝光数据。我们发现，与按时间顺序排列的信息流相比，X 的算法信息流放大了有毒、极化、政治化和右倾的内容，且这种放大效应因用户意识形态的不同而存在显著差异。反事实分析表明，人口统计学信号以依赖于角色的方式影响内容分发：汇总效应基本为零，但子群体层面的效应在方向和幅度上各不相同。我们的研究确立了基于生成式 AI 的智能体作为算法审计的新工具。