Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Anthropic 的安全警告可能适得其反——政府已叫停其最强大的 AI 模型

The U.S. government on Friday ordered Anthropic to immediately shut off access to two of its most powerful AI models — Claude Fable 5 and Claude Mythos 5 — citing national security concerns. Anthropic announced on X that it has complied, but it made clear it thinks the government got this one wrong. 美国政府周五以国家安全为由,下令 Anthropic 立即关闭其两款最强大 AI 模型——Claude Fable 5 和 Claude Mythos 5——的访问权限。Anthropic 在 X 上宣布已遵从该指令,但明确表示认为政府此举有误。

The directive, which Anthropic said it received on Friday at 5:21 pm ET, forces the company to disable both models for all users worldwide — not just the foreign nationals the government’s export control order was nominally aimed at. Access to Anthropic’s other models isn’t affected. Anthropic 表示,该指令于美东时间周五下午 5:21 送达,强制要求公司在全球范围内对所有用户禁用这两款模型,而不仅仅是政府出口管制令名义上针对的外国公民。Anthropic 的其他模型访问权限不受影响。

Why does any of this matter? Mythos is Anthropic’s most capable AI model, one the company previewed in early April and has kept tightly restricted ever since because of what Anthropic described as its exceptional ability to find security vulnerabilities in software. According to Anthropic, Mythos identified flaws in every major operating system and web browser it tested, so rather than release it broadly, the company launched a controlled program called Project Glasswing, sharing it with roughly 50 vetted organizations, including Amazon, Apple, Google, Microsoft, and CrowdStrike, to use for defensive cybersecurity work. 为什么这件事很重要?Mythos 是 Anthropic 能力最强的 AI 模型。公司在 4 月初预览了该模型,并因其在发现软件安全漏洞方面的卓越能力而一直对其进行严格限制。据 Anthropic 称,Mythos 在其测试的每一个主流操作系统和网络浏览器中都发现了缺陷。因此,公司没有将其广泛发布,而是启动了一个名为“Project Glasswing”的受控项目,将其分享给包括亚马逊、苹果、谷歌、微软和 CrowdStrike 在内的约 50 家经过审查的机构,用于防御性网络安全工作。

Fable 5, released just three days ago, was Anthropic’s answer to the obvious commercial pressure: a version of Mythos fitted with guardrails that block responses in high-risk areas like cybersecurity and biology, making it safe enough for general release, the company argued. It was immediately the most capable AI model available to the public, according to benchmark tests from Vals AI, a company that tracks AI tech performance. 三天前发布的 Fable 5 是 Anthropic 对明显商业压力做出的回应:公司认为,这是一个配备了防护栏的 Mythos 版本,能够拦截网络安全和生物学等高风险领域的回答,从而使其足够安全以供公开发布。根据追踪 AI 技术性能的 Vals AI 公司的基准测试,它一经发布便成为公众可用的最强 AI 模型。

The government’s directive is framed as an export control action, restricting foreign national access to the models. But in a lengthy blog post, Anthropic says its understanding is that the underlying concern is a claimed jailbreak of Fable 5. So far, the company says, the government has provided only verbal evidence of a “potential narrow, non-universal jailbreak” — one that, as Anthropic describes it, amounts to prompting the model to read a specific codebase and identify software flaws. 政府的指令被定性为一项出口管制行动,旨在限制外国公民对这些模型的访问。但在随后的一篇长篇博文中,Anthropic 表示,据其理解,根本原因在于所谓的 Fable 5 “越狱”问题。公司称,到目前为止,政府仅提供了关于“潜在的、局部的、非普遍性越狱”的口头证据——按照 Anthropic 的描述,这仅仅是诱导模型读取特定的代码库并识别软件缺陷。

And by the way, adds the company, it’s a “level of capability” that’s already widely available in other publicly accessible models, including OpenAI’s GPT-5.5. It’s also used routinely by cybersecurity professionals for defensive purposes, says Anthropic. 此外,公司补充道,这种“能力水平”在其他公开可用的模型(包括 OpenAI 的 GPT-5.5)中已经非常普遍。Anthropic 表示,网络安全专业人员也经常将其用于防御目的。

Anthropic’s broader argument is that its strongest safeguards operate through independent classifier systems that function separately from the model itself, meaning that even if someone convinces Fable to keep talking past a refusal, the underlying protections against the most dangerous outputs remain in place. Anthropic 更广泛的论点是,其最强大的安全保障是通过独立于模型本身运行的分类器系统实现的,这意味着即使有人说服 Fable 绕过拒绝机制继续对话,针对最危险输出的底层保护措施依然有效。

Clearly, none of that was enough to stop the government from acting, and Anthropic isn’t hiding its frustration. “We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people,” the company wrote. “If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.” 显然,这些解释不足以阻止政府采取行动,Anthropic 也没有掩饰其挫败感。公司写道:“我们不同意将发现一个局部的潜在越狱作为召回已部署给数亿人的商业模型的理由。如果这一标准在全行业推行,我们认为这将从根本上停止所有前沿模型提供商的新模型部署。”

Anthropic is widely expected to pursue an IPO this year and has staked much of its public identity on being the safety-conscious alternative to its rivals. The irony isn’t lost on observers that the very caution Anthropic displayed in restricting Mythos — which it promoted as a model so dangerous it couldn’t be released publicly — has now apparently attracted exactly the kind of government scrutiny that could disrupt its business most. 外界普遍预计 Anthropic 今年将寻求 IPO,并将其作为竞争对手中“注重安全”的替代者作为其公众形象的核心。观察人士不难发现其中的讽刺意味:Anthropic 在限制 Mythos 时所表现出的谨慎——它曾宣传该模型极其危险,以至于无法公开发布——现在显然恰恰招致了最可能扰乱其业务的政府审查。

OpenAI’s Sam Altman must be enjoying this, at least. In April, he told podcaster Ashlee Vance that Anthropic’s handling of Mythos amounted to “fear-based marketing.” “It is clearly incredible marketing to say, ‘We have built a bomb. We were about to drop it on your head. We will sell you a bomb shelter for $100 million,’” Altman said. OpenAI 的 Sam Altman 至少对此感到幸灾乐祸。今年 4 月,他曾告诉播客主持人 Ashlee Vance,Anthropic 对 Mythos 的处理方式等同于“基于恐惧的营销”。Altman 说:“声称‘我们制造了一枚炸弹,我们差点把它扔到你头上,但我们会以 1 亿美元的价格卖给你一个防空洞’,这显然是令人难以置信的营销手段。”

Altman, whose company is also widely expected to pursue an IPO as soon as possible, didn’t predict a government shutdown, but he identified something that has come back to bite Anthropic for now, which is that when you spend months telling the world your AI is uniquely dangerous, the world — the U.S. government included — tends to listen. Altman 的公司也普遍被认为将尽快寻求 IPO。他虽然没有预见到政府会叫停模型,但他指出了一个目前正反噬 Anthropic 的事实:当你花费数月时间告诉全世界你的 AI 具有独特的危险性时,全世界——包括美国政府——往往会当真。