After spooking Trump into safety testing, Anthropic AI models get global release

After spooking Trump into safety testing, Anthropic AI models get global release

在引发特朗普政府安全担忧并进行测试后,Anthropic AI 模型获准全球发布

The US has lifted export curbs on Anthropic’s newest Claude models, Fable 5 and Mythos 5, about three weeks after the Trump administration flagged the models as national security risks. As of today, Anthropic confirmed in a blog post, Fable 5 will be available globally, and US organizations have had access restored to Mythos 5 since June 26. 在美国特朗普政府将 Anthropic 最新的 Claude 模型(Fable 5 和 Mythos 5)列为国家安全风险约三周后,美国已解除了对这些模型的出口限制。Anthropic 在今日的一篇博客文章中确认,Fable 5 将在全球范围内开放使用,而美国机构对 Mythos 5 的访问权限已于 6 月 26 日恢复。

Anthropic said it is now working with the government to expand Mythos access to a “broader set of domestic and international partners in the Glasswing program.” That program allows cybersecurity researchers at trusted companies to access Mythos for defensive purposes. Anthropic 表示,目前正与政府合作,将 Mythos 的访问权限扩展至“Glasswing 项目中更广泛的国内和国际合作伙伴”。该项目允许受信任公司的网络安全研究人员出于防御目的访问 Mythos。

In a letter to Anthropic viewed by Reuters and The New York Times, Commerce Secretary Howard Lutnick said Anthropic would “no longer need a license for exports or in-country transfers of its Claude Mythos and Claude Fable AI models.” The letter acknowledged that Anthropic had “taken steps in close coordination with the US government to address the risks” posed by the models. 在路透社和《纽约时报》看到的一封致 Anthropic 的信函中,商务部长霍华德·卢特尼克(Howard Lutnick)表示,Anthropic “不再需要为其 Claude Mythos 和 Claude Fable AI 模型申请出口或国内转移许可”。信中承认,Anthropic 已“与美国政府密切配合,采取了措施以应对这些模型带来的风险”。

Facing a longer delay in its models’ releases, Lutnick said that Anthropic agreed to expand its partnership with the government. The company said it also set up a program to work with hackers to red-team its models, and there’s now a dedicated internal team to monitor reports of emerging jailbreak threats 24/7. 面对模型发布被长期推迟的局面,卢特尼克表示 Anthropic 已同意扩大与政府的合作。该公司称,他们还建立了一个与黑客合作的项目,对其模型进行红队测试,并设立了一个专门的内部团队,全天候 24/7 监控新兴越狱威胁的报告。

In the letter, Lutnick reminded Anthropic that the US “reserves the right to re-evaluate the decisions” and reimpose export curbs at any point. But for now, Lutnick joined White House Chief of Staff Susie Wiles in celebrating Fable 5’s redeployment on X. “Over the past two weeks, we have worked closely with Anthropic to analyze and approve Fable 5 to ensure alignment across the US Government and strengthen America’s leadership in AI,” Lutnick said. 在信中,卢特尼克提醒 Anthropic,美国“保留重新评估这些决定”并随时重新实施出口限制的权利。但目前,卢特尼克与白宫办公厅主任苏西·怀尔斯(Susie Wiles)一同在 X 平台上庆祝了 Fable 5 的重新部署。卢特尼克表示:“在过去两周里,我们与 Anthropic 密切合作,对 Fable 5 进行了分析和审批,以确保其符合美国政府的要求,并巩固美国在人工智能领域的领导地位。”

Wiles did not directly mention Anthropic but claimed a win for Trump, writing that “the government and private sector have worked together in a way we have never seen before and this foundation of America First is unprecedented. Our shared priority remains: get the best tech deployed as quickly and safely as possible.” 怀尔斯没有直接提及 Anthropic,但声称这是特朗普政府的一次胜利。她写道:“政府与私营部门以一种前所未有的方式进行了合作,这种‘美国优先’的基础是史无前例的。我们共同的优先事项依然是:尽可能快速、安全地部署最先进的技术。”

Trade-off: Fable 5 may block routine coding tasks

权衡:Fable 5 可能会拦截常规编码任务

On June 12, the Commerce Department ordered Anthropic to shut off access to its most advanced models for anyone outside the US. The order emerged from fears that China, Russia, or other countries of concern may exploit the models to attack US infrastructure, like the electric grid or the banking system. 6 月 12 日,美国商务部下令 Anthropic 切断美国境外用户对其最先进模型的访问权限。该命令源于担忧中国、俄罗斯或其他受关注国家可能利用这些模型攻击美国基础设施,如电网或银行系统。

In response, Anthropic shut down all access, as it didn’t have a way to block users by country. In particular, Mythos was viewed as “uniquely attractive to malicious actors who wish to misuse it in cyberattacks,” Anthropic’s blog said. According to Anthropic, the model “can be used to find and exploit software vulnerabilities more effectively than any other model—and all but the most skilled human security experts,” and those “prodigious cybersecurity capabilities” could be used against the US. 对此,Anthropic 关闭了所有访问权限,因为它当时无法按国家/地区拦截用户。Anthropic 的博客指出,Mythos 被认为“对那些希望将其滥用于网络攻击的恶意行为者具有独特的吸引力”。据 Anthropic 称,该模型“在发现和利用软件漏洞方面,比任何其他模型——以及除最顶尖人类安全专家之外的任何人——都更有效”,而这些“惊人的网络安全能力”可能被用于针对美国。

Fable 5 shares the “same underlying model,” Anthropic said, but unlike Mythos 5, it “provides no such unique offensive capabilities.” Designed for the general public, Fable 5 already had the strongest safeguards Anthropic has ever applied to a model, and Anthropic said those safeguards are now even stronger ahead of redeployment. Anthropic 表示,Fable 5 共享“相同的底层模型”,但与 Mythos 5 不同的是,它“不具备这种独特的攻击能力”。Fable 5 专为大众设计,此前已具备 Anthropic 有史以来最强的模型安全防护措施,而 Anthropic 表示,在重新部署之前,这些防护措施现在变得更加强大。

After weeks of testing, Fable 5 is no longer vulnerable to a bypassing method discovered by Amazon researchers that identified several software vulnerabilities and triggered the export curbs. Most troublingly, Anthropic said, was a case in which the model was manipulated into producing code that demonstrated how a vulnerability could be exploited. 经过数周的测试,Fable 5 已不再容易受到亚马逊研究人员发现的某种绕过方法的攻击,该方法曾识别出多个软件漏洞并引发了出口限制。Anthropic 表示,最令人担忧的是,该模型曾被操纵生成演示如何利用漏洞的代码。

According to Anthropic, testing confirmed that less advanced rival models on the market, like GPT-5.5 and Kimi K2.7, “could identify the same vulnerabilities as Fable 5 did in the report.” That confirmed that “the reported technique did not expose any unique Mythos-level cyber capabilities,” Anthropic said, and “only involved routine defensive cybersecurity work.” 据 Anthropic 称,测试证实市场上其他较不先进的竞争模型(如 GPT-5.5 和 Kimi K2.7)“也能识别出报告中 Fable 5 所发现的相同漏洞”。Anthropic 表示,这证实了“所报告的技术并未暴露任何独特的 Mythos 级网络能力”,且“仅涉及常规的防御性网络安全工作”。

“Even so, we moved quickly to address the reported bypass,” Anthropic wrote. That jailbreak method is currently blocked in over 99 percent of cases, Anthropic said. However, tightening safeguards came with a “trade-off” that may cause some benign prompts to be blocked “during routine coding and debugging tasks,” the company acknowledged. “即便如此,我们还是迅速采取行动解决了所报告的绕过问题,”Anthropic 写道。Anthropic 表示,该越狱方法目前在超过 99% 的情况下已被拦截。然而,该公司承认,加强防护措施带来了一种“权衡”,可能会导致一些良性提示在“常规编码和调试任务中”被拦截。

“Working closely with the government, we trained an improved safety classifier that targets and blocks the behavior described in the report,” Anthropic said. “Users will be notified if a request to Fable 5 is blocked, and the request will instead be sent to Opus 4.8.” “通过与政府密切合作,我们训练了一个改进的安全分类器,专门针对并拦截报告中描述的行为,”Anthropic 表示。“如果用户对 Fable 5 的请求被拦截,系统会通知用户,并将该请求转交给 Opus 4.8 处理。”

Of course, Anthropic’s new classifier, which helps avoid uniquely dangerous attacks on the models, can make “mistakes,” Anthropic said. The company has long maintained that it’s “probably impossible” to build a model fully “impervious” to jailbreaks, but by ramping up red-teaming, Anthropic hopes to “ensure that we and our safety partners will be the first to find major jailbreaks and fix them before malicious actors can use them for harm.” 当然,Anthropic 表示,这个有助于避免模型遭受独特危险攻击的新分类器也可能会犯“错误”。该公司长期以来一直坚持认为,构建一个完全“不受”越狱影响的模型“几乎是不可能的”,但通过加强红队测试,Anthropic 希望“确保我们和我们的安全合作伙伴能率先发现重大越狱漏洞,并在恶意行为者利用它们造成伤害之前将其修复”。

The attack Amazon flagged currently works only in a “very small fraction of cases,” where “the model may provide information that isn’t detailed enough to help a cyberattacker,” Anthropic said. By being “cautious,” Anthropic said that “the vast majority of jailbreaks will not successfully unblock dangerous behaviors” and will be “very costly and high-effort to produce.” Anthropic 表示,亚马逊标记的攻击目前仅在“极少数情况下”有效,且“模型提供的信息可能不足以帮助网络攻击者”。通过保持“谨慎”,Anthropic 表示“绝大多数越狱尝试都无法成功解锁危险行为”,且“实施成本极高,难度极大”。

“Even if a jailbreak is successful, our extra layers of defense”—which requires some blocking of benign requests—“provide additional mitigation,” the company said. 该公司表示:“即使越狱成功,我们额外的防御层——尽管需要拦截一些良性请求——也能提供额外的缓解作用。”

Anthropic’s plan to score jailbreaks

Anthropic 的越狱评分计划

Anthropic’s blog post seems to downplay the threat that Amazon identified as less risky than what it considers the greatest threat to governments: universal jailbreaks that can unlock a wide range of vulnerabilities and enable unforeseeable attacks. Anthropic 的博客文章似乎淡化了亚马逊所识别出的威胁,认为其风险低于政府所面临的最大威胁:即能够解锁广泛漏洞并引发不可预见攻击的“通用越狱”。