You Can Now Sound the Alarm on AI Behaving Badly

现在，你可以为表现不佳的 AI 拉响警报了

Writing AI Lab each week means I occasionally encounter AI models that behave badly and bizarrely. Usually, there’s nothing to be done about it, save for sharing those tales with you. But that could soon change. 每周撰写“AI 实验室”（AI Lab）专栏意味着我偶尔会遇到表现糟糕且怪异的 AI 模型。通常情况下，除了与你们分享这些故事外，我对此无能为力。但这种情况很快就会改变。

A group of AI researchers has set up a crowdsourced website, Flaw Reporting for AI (FLARE-AI), for reporting and tracking AI harms. If, for example, a chatbot generates malware or a bomb-making recipe, leaks personal information, or triggers delusional thinking in users, FLARE-AI could be used to sound the alarm. The open source code behind the system allows others to verify an issue and route reports to model makers, as well as organizations like MITRE, a nonprofit that tracks problems with technical systems. It’s a bit like Downdetector, which compiles real-time user reports for global service outages affecting things like apps and websites. 一群 AI 研究人员建立了一个名为“AI 缺陷报告”（FLARE-AI）的众包网站，用于报告和追踪 AI 造成的危害。例如，如果聊天机器人生成恶意软件或制造炸弹的配方、泄露个人信息，或诱发用户的妄想思维，FLARE-AI 就可以用来拉响警报。该系统背后的开源代码允许他人验证问题，并将报告发送给模型开发者，以及像 MITRE 这样的非营利组织（该组织专门追踪技术系统的问题）。这有点像 Downdetector，后者汇集了全球用户关于应用程序和网站等服务中断的实时报告。

The website is another step in the group’s ongoing work with AI reporting, which I first wrote about last year. Members of the group also consulted on a congressional bill announced in June, which would see the US government take a central role in tracking this kind of AI misbehavior. 该网站是该团队在 AI 报告领域持续工作迈出的又一步，我去年曾首次报道过相关内容。该小组成员还参与了 6 月份宣布的一项国会法案的咨询工作，该法案旨在让美国政府在追踪此类 AI 不当行为方面发挥核心作用。

“Right now, there is no centralized, accountable way to report flaws in AI systems,” says Avijit Ghosh, an artificial intelligence policy researcher at HuggingFace who co-led development of FLARE-AI with computer scientists Elaine Zhu and Shayne Longpre. “目前，还没有一种集中且负责任的方式来报告 AI 系统中的缺陷，”HuggingFace 的人工智能政策研究员 Avijit Ghosh 说道。他与计算机科学家 Elaine Zhu 和 Shayne Longpre 共同领导了 FLARE-AI 的开发。

The alarm system was developed in collaboration with 49 AI experts from 32 different organizations. In a paper outlining the work, the researchers argue that their initiative could prove crucial as AI is adopted more widely and as agentic systems gain greater power. The lack of a consistent way to report AI flaws is a significant problem, they believe. 该警报系统是与来自 32 个不同组织的 49 位 AI 专家合作开发的。在一篇概述该工作的论文中，研究人员指出，随着 AI 的广泛应用以及代理系统（agentic systems）获得更强大的能力，他们的这一举措可能至关重要。他们认为，缺乏一种统一的 AI 缺陷报告方式是一个严重的问题。

“I think it’s a really good initiative,” says Jessica Ji, a researcher at the think tank Center for Security and Emerging Technology. Ji says the researchers are right to note that existing reporting mechanisms are fragmented and that AI models are black boxes. “I’m in support of anything that makes AI more transparent,” she says. “我认为这是一个非常好的倡议，”智库“安全与新兴技术中心”（Center for Security and Emerging Technology）的研究员 Jessica Ji 表示。Ji 认为，研究人员指出当前的报告机制支离破碎且 AI 模型如同“黑箱”，这一点非常正确。“我支持任何能让 AI 变得更透明的举措，”她说。

Though bugs and cybersecurity problems get a lot of attention—especially of late—Ghosh tells me that problems with AI systems span topics like psychological harm, discrimination or bias, and misinformation. He adds that different companies have different standards around such issues, which means some problems go unrecognized. “In the absence of a coordinated disclosure system, there are no external mechanisms to enforce transparency,” Ghosh says. 尽管漏洞和网络安全问题受到了广泛关注——尤其是最近——但 Ghosh 告诉我，AI 系统的问题涵盖了心理伤害、歧视或偏见以及错误信息等多个方面。他补充说，不同的公司对这些问题的标准不同，这意味着一些问题无法被识别。“在缺乏协调一致的披露系统的情况下，没有外部机制来强制执行透明度，”Ghosh 说。

A spate of recent incidents involving popular AI tools shows how easily the technology can go bad. 近期一系列涉及热门 AI 工具的事件表明，这项技术是多么容易“变坏”。

This week, a company called LayerX disclosed a way to dupe AI-infused web browsers, including OpenAI’s Atlas and Perplexity’s Comet, into vaulting their guardrails. Convincing the AI model behind the browser that it was playing a game, for example, could lead to the browser going rogue and trying to hack a website. (The companies responsible for the affected browsers have fixed the issue, LayerX says.) And this April, Johann Rehberger, a security researcher, discovered a way to trick Claude into divulging personal data using images generated by ChatGTP. 本周，一家名为 LayerX 的公司披露了一种欺骗内置 AI 的网络浏览器（包括 OpenAI 的 Atlas 和 Perplexity 的 Comet）以绕过其护栏的方法。例如，让浏览器背后的 AI 模型相信它正在玩游戏，可能会导致浏览器“失控”并试图攻击网站。（LayerX 表示，受影响浏览器的相关公司已经修复了该问题。）此外，今年 4 月，安全研究员 Johann Rehberger 发现了一种利用 ChatGPT 生成的图像诱骗 Claude 泄露个人数据的方法。

AI introduces bizarre new kinds of problems, too. Last year, OpenAI was forced to update its models after it discovered that they were overly sycophantic, which sometimes appeared to encourage delusional thinking. AI 也带来了各种怪异的新问题。去年，OpenAI 在发现其模型过于“谄媚”（sycophantic）后被迫进行更新，这种倾向有时似乎会助长妄想思维。

Rumman Chowdhury, the CEO and founder of Humane Intelligence PBC, says FLARE-AI could be a useful way for many AI developers to implement ways of reporting issues with their tools. But she adds that such initiatives often come with serious challenges. Humane Intelligence PBC 的首席执行官兼创始人 Rumman Chowdhury 表示，对于许多 AI 开发者来说，FLARE-AI 可能是一种实现工具问题报告的有效途径。但她补充说，此类倡议往往伴随着严峻的挑战。

One is managing a flood of reported issues, many of which may not be serious. Another is ensuring reporting schemes are backed by credible and authoritative organizations. 一是如何管理海量的报告问题，其中许多可能并不严重；二是如何确保报告机制得到可信且权威机构的支持。

Last month’s congressional bill could put some US government heft behind an effort like FLARE-AI. The legislation, introduced by Representatives Deborah Ross, Jeff Hurd, and Don Beyer, would require the National Institute of Standards and Technology to develop standards around AI flaw reporting and to maintain a centralized AI flaw reporting database. Ghosh and his co-leads say this would incentivize AI developers to address issues in their systems and let users examine the safety of different systems for different use cases. 上个月的国会法案可能会为像 FLARE-AI 这样的努力提供美国政府层面的支持。该法案由众议员 Deborah Ross、Jeff Hurd 和 Don Beyer 提出，要求美国国家标准与技术研究院（NIST）制定关于 AI 缺陷报告的标准，并维护一个集中的 AI 缺陷报告数据库。Ghosh 和他的联合领导者们表示，这将激励 AI 开发者解决其系统中的问题，并让用户能够针对不同的使用场景检查不同系统的安全性。

The need for new ways to report AI harms only seems likely to grow. Agentic systems like OpenClaw have greater potential to do harm, as do models that are more capable of probing and hacking computer systems. I may be using FLARE-AI to report my own misadventures soon enough. 对报告 AI 危害的新方式的需求似乎只会与日俱增。像 OpenClaw 这样的代理系统具有更大的潜在危害，那些更有能力探测和攻击计算机系统的模型也是如此。我可能很快就会使用 FLARE-AI 来报告我自己的“奇遇”了。

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here. 这是 Will Knight 的“AI 实验室”通讯。点击此处阅读往期内容。