Google DeepMind is worried about what happens when millions of agents start to interact

Google DeepMind 担忧数百万个 AI 智能体相互交互时可能引发的后果

EXECUTIVE SUMMARY Google DeepMind is funding research into the potential dangers of situations where millions of different AI agents interact with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of agents that can carry out tasks without human oversight and follow instructions given to them by other agents creates a whole new class of risk.

执行摘要 Google DeepMind 正在资助一项研究，旨在探讨数百万个不同的 AI 智能体在互联网上相互交互时可能带来的潜在危险。据负责该公司通用人工智能（AGI）安全与对齐研究的 Rohin Shah 表示，无需人工监督即可执行任务、并能听从其他智能体指令的智能体大规模进入市场，创造了一类全新的风险。

In an effort to address this, Google DeepMind—which made agent-based tools a centerpiece of Google I/O last month—has teamed up with several other organizations to announce a $10 million funding pot for researchers to study the behavior of multi-agent systems and come up with ways to prevent unsafe scenarios. Joining Google DeepMind are Schmidt Sciences, a philanthropic foundation set up by Eric and Wendy Schmidt; ARIA, the UK government’s moonshot agency; the Cooperative AI foundation, a UK-based nonprofit research outfit; and Google’s charitable arm, Google.org.

为了解决这一问题，Google DeepMind（在上个月的 Google I/O 大会上将基于智能体的工具作为核心展示）联合其他几家机构宣布设立 1000 万美元的专项资金，用于资助研究人员研究多智能体系统的行为，并提出预防不安全场景的方法。参与此次资助的机构包括：由 Eric 和 Wendy Schmidt 创立的慈善基金会 Schmidt Sciences；英国政府的“登月”计划机构 ARIA；总部位于英国的非营利研究机构 Cooperative AI Foundation；以及 Google 的慈善部门 Google.org。

I asked Shah and James Fox, who leads the Science of Trustworthy AI program at Schmidt Sciences, what they hope to achieve with that $10 million. It’s no small sum, but it’s dwarfed by the budgets commanded by Google DeepMind’s own research teams. The aim is to kick-start research outside tech companies, says Shah: “The strength of academia is that it can look really quite far into the future and do the kind of work that isn’t top of mind at industry labs.” “The main issue is that there just isn’t really a field of research for multi-agent safety yet,” he adds. “And we would like there to be.”

我询问了 Shah 和 Schmidt Sciences“可信 AI 科学”项目负责人 James Fox，他们希望用这 1000 万美元实现什么目标。这笔钱虽然不少，但与 Google DeepMind 自身研究团队的预算相比仍是小巫见大巫。Shah 表示，其目的是启动科技公司之外的研究：“学术界的优势在于，它能看得非常长远，并开展那些在工业实验室中并非首要考虑的工作。”他补充道：“主要问题在于，目前还没有真正形成一个多智能体安全的研究领域，而我们希望能够建立起这个领域。”

The concern is that as more and more AI agents get deployed and begin working together, we could hit a tipping point where imagined scenarios become real. “We see this with humanity, too,” says Shah. “Our institutions can accomplish things that no individual human can.” Shah thinks we have a few more months to go before agents are deployed throughout the economy in numbers that make potential risks a real concern. He wants to get ahead of that moment.

人们担心，随着越来越多的 AI 智能体被部署并开始协同工作，我们可能会触及一个临界点，使那些设想中的场景成为现实。“我们在人类社会中也看到了这一点，”Shah 说，“我们的机构能够完成任何个人无法完成的事情。”Shah 认为，距离智能体在经济领域大规模部署、使潜在风险成为真正隐患的时刻，我们还有几个月的时间。他希望在那个时刻到来之前做好准备。

Risky business What risks are we talking about, exactly? The possibilities that Shah and Fox have in mind mostly boil down to supercharged versions of bad things that happen on the internet already: scams, prompt injections (where an AI agent is fed malicious instructions, turning it into a self-guiding piece of malware), other forms of cyberattack. We look at what humans do now and ask what the agent version of that would be, says Shah. “We’ve got this digital commons that is integral to how society works, and you really want to ensure that this doesn’t descend into just absolute anarchy,” says Fox.

高风险业务 我们到底在谈论什么样的风险？Shah 和 Fox 所设想的可能性，大多归结为互联网上现有不良行为的“超级增强版”：诈骗、提示词注入（即向 AI 智能体输入恶意指令，使其变成自动运行的恶意软件）以及其他形式的网络攻击。Shah 说，我们观察人类现在的行为，并思考其智能体版本会是什么样。“我们拥有一个对社会运作至关重要的数字公地，你肯定希望确保它不会陷入绝对的无政府状态，”Fox 说道。

(I asked Shah if they were considering any worst-case scenarios more on the doomer end of the spectrum, such as widespread economic collapse. “Certainly not if we’re talking by the end of the year,” he said. That’s only six months away! He laughed. “Okay, a while after that.”)

（我问 Shah 他们是否考虑过更极端的“末日论”场景，比如大规模经济崩溃。“如果说的是今年年底前，那肯定不会，”他说。那离现在只有六个月了！他笑了笑，“好吧，那之后的一段时间里可能会。”）

Shah and Fox both think that the only way to understand what might happen when large numbers of multi-agent systems interact with each other is to run realistic simulations. They want researchers to drop AI agents into sandboxes and study what they do. You can’t predict what’s going to happen by studying single agents, or even small groups of agents, in isolation. You can’t assume that AI agents underpinned by LLMs will always act rationally, says Fox. And the complexity comes from having huge numbers of interactions at once. Some researchers, including a team at Google DeepMind, have argued that artificial general intelligence (if possible at all) could come not from a single super-smart model but from a kind of agent hive mind, where the capabilities of the whole add up to more than the sum of its parts.

Shah 和 Fox 都认为，要了解大量多智能体系统相互交互时可能发生的情况，唯一的方法就是进行逼真的模拟。他们希望研究人员将 AI 智能体放入沙盒中，研究它们的行为。你无法通过孤立地研究单个智能体，甚至是一小群智能体来预测未来。Fox 指出，不能假设由大语言模型（LLM）支撑的 AI 智能体总是会理性行事。复杂性源于同时发生的大量交互。一些研究人员（包括 Google DeepMind 的一个团队）认为，通用人工智能（如果可能实现的话）可能并非来自单一的超智能模型，而是来自一种“智能体蜂群思维”，在这种思维中，整体的能力大于各部分之和。

Lack of trust Google DeepMind is not the only top AI firm warning about the risks of the technology it is building. A couple of weeks ago, Anthropic published guidelines for deploying AI agents based on an approach to cybersecurity known as zero trust, which starts with the assumption that a computer system is vulnerable, an agent is an attacker, and a breach will happen. Refael Angel, cofounder and CTO of Akeyless, a cybersecurity firm based in Tel Aviv, agrees that understanding the new risks introduced by agent-based systems is crucial. Every approach to security in the past has assumed that the machine in question was software written by a human, doing fixed things on fixed paths, says Angel: “An agent breaks all of those assumptions. It reasons, it improvises, and it can be hijacked by a single sentence buried in a document it was asked to read.”

缺乏信任 Google DeepMind 并不是唯一一家对其所构建技术的风险发出警告的顶级 AI 公司。几周前，Anthropic 发布了部署 AI 智能体的指南，该指南基于一种被称为“零信任”的网络安全方法，其前提是假设计算机系统是脆弱的，智能体是攻击者，且漏洞终将发生。总部位于特拉维夫的网络安全公司 Akeyless 的联合创始人兼 CTO Refael Angel 同意，理解基于智能体的系统所带来的新风险至关重要。Angel 表示，过去所有的安全方法都假设机器是人类编写的软件，在固定的路径上做固定的事情：“而智能体打破了所有这些假设。它会推理、会即兴发挥，并且可能仅仅因为被要求阅读的一份文档中埋藏的一句话而被劫持。”

Angel welcomes this new funding. “No single lab should author the safety standards everyone else has to trust,” he says. But he cautions that safety researchers can overlook boring problems that are already here in favor of more exotic hypothetical ones. And yet, Fox notes, risks that were hypothetical a few years ago are now very real: “The future’s come more quickly than perhaps expected.”

Angel 对这笔新资金表示欢迎。“没有任何一个实验室应该独自制定让所有人必须信任的安全标准，”他说。但他提醒道，安全研究人员可能会为了追求更具异国情调的假设性问题，而忽略了已经存在的枯燥问题。然而，Fox 指出，几年前还是假设性的风险现在已经非常真实了：“未来到来的速度可能比预期的要快。”