The US government’s Anthropic models ban was never about an AI jailbreak

The US government’s Anthropic models ban was never about an AI jailbreak

美国政府对 Anthropic 模型的禁令根本不是为了防止 AI 越狱

The U.S. government’s enforcement letter to Anthropic, which effectively forced the company to pull its latest AI models offline just before the weekend, should be a wake-up call for any U.S. tech company — AI lab or otherwise. 美国政府发给 Anthropic 的强制执行函在周末前夕生效,迫使该公司将其最新的 AI 模型下线。这对任何美国科技公司——无论是 AI 实验室还是其他企业——都应是一个警钟。

To catch you up on the news blitz: On Friday afternoon, the U.S. Commerce Department sent Anthropic a letter invoking an obscure export control directive that banned non-Americans, including Anthropic’s employees, from accessing Fable 5 and Mythos 5, citing an unspecified national security concern. 回顾一下新闻要点:周五下午,美国商务部向 Anthropic 发送了一封信函,援引了一项鲜为人知的出口管制指令,以“未指明的国家安全担忧”为由,禁止非美国公民(包括 Anthropic 的员工)访问 Fable 5 和 Mythos 5 模型。

Anthropic said it believes the letter is related to a bypass of the model’s guardrails, but isn’t sure because the letter doesn’t provide specific details. The letter has not been made public. In response, Anthropic shut down both of its top models to all customers to ensure that it complied with the directive. Anthropic 表示,他们认为这封信与模型护栏被绕过有关,但由于信中未提供具体细节,他们并不确定。该信函尚未公开。作为回应,Anthropic 对所有客户关闭了这两个顶级模型,以确保符合指令要求。

The result was that the U.S. government successfully forced a tech company to pull its models offline with a swift and unilateral action that didn’t appear to require court approval. Friday’s intervention by the Trump administration shows that the AI industry is not immune to government interference. It’s also a warning to the wider tech industry: comply, or we can shut you and your products down. 结果是,美国政府通过迅速且单方面的行动,成功迫使一家科技公司将其模型下线,且这一过程似乎无需法院批准。特朗普政府周五的干预表明,AI 行业并非不受政府干预的影响。这也是对更广泛科技行业的警告:要么服从,要么我们有能力关停你们及其产品。

Citing sources, Axios described a tense situation over the weekend between the two major players, saying that the “personality differences” between Anthropic and the Trump administration led to the export directive, rather than a technical issue with the AI products. 据 Axios 援引消息人士报道,周末期间双方关系紧张,并称导致该出口指令的原因是 Anthropic 与特朗普政府之间的“性格不合”,而非 AI 产品本身存在技术问题。

New details about the issue that emerged over the weekend now cast further doubt on the government’s already shaky reasoning. Katie Moussouris, a cybersecurity veteran and researcher who founded Luta Security, said in a blog post that Anthropic recently shared with her a private copy of a paper written by security researchers describing an alleged guardrail bypass in Fable 5. (The Wall Street Journal reports that the paper’s authors are security researchers at Amazon.) 周末浮现出的新细节,让政府本就站不住脚的理由进一步受到质疑。网络安全资深人士、Luta Security 创始人 Katie Moussouris 在博客中表示,Anthropic 最近与她分享了一份由安全研究人员撰写的论文私密副本,其中描述了 Fable 5 中所谓的护栏绕过漏洞。(《华尔街日报》报道称,该论文作者是亚马逊的安全研究人员。)

Moussouris said that Anthropic reached out to ask for her take on the paper. Moussouris’ blog post described how the researchers triggered the guardrail bypass, but said that the bypass itself “should never have triggered an export control.” The difference is largely between asking an AI model to “review code for security issues” versus asking it to “fix this code.” The end result is largely the same, even if the questions are posed slightly differently. Moussouris 表示,Anthropic 曾联系她征求对该论文的看法。她在博客中描述了研究人员如何触发护栏绕过,但指出这种绕过本身“根本不应触发出口管制”。其区别主要在于要求 AI 模型“审查代码的安全问题”与要求它“修复代码”之间。即使提问方式略有不同,最终结果基本是一样的。

“The behavior described in the paper cannot meaningfully be fixed, and any attempt would only weaken the model for defense,” said Moussouris, who criticized the export control directive as hasty, heavy-handed, and misguided. “论文中描述的行为无法从根本上修复,任何尝试只会削弱模型的防御能力,”Moussouris 说道。她批评该出口管制指令草率、手段强硬且误入歧途。

Moussouris and dozens of other top security researchers and experts have since called on the Trump administration to revoke the export control order, calling the move to pull advanced cybersecurity capabilities from network defenders in the U.S. as “dangerous.” 此后,Moussouris 和数十位顶级安全研究人员及专家呼吁特朗普政府撤销该出口管制令,称剥夺美国网络防御者的先进网络安全能力是“危险的”举动。

Past administrations have made sweeping decisions on knowledge gaps. For instance, language used by the U.S. government during the 2010s to fix export law covering cybersecurity tools that could also be used for cyberattacks was so broad that inadvertently, it nearly outlawed legitimate security and vulnerability research. However, the Trump administration’s directive appears retaliatory. 历届政府都曾在知识空白领域做出过大范围的决策。例如,美国政府在 2010 年代为修订涵盖网络安全工具(同时也可能被用于网络攻击)的出口法律所使用的措辞过于宽泛,以至于无意中几乎将合法的安全和漏洞研究定为非法。然而,特朗普政府此次的指令看起来更像是报复行为。

Justin Hendrix, the editor of Tech Policy Press, said the Trump administration’s move is “likely to raise alarms in foreign capitals about the reliability of American AI for critical applications.” The message is that AI companies in the United States can’t be trusted to operate without interference from the U.S. government. 《Tech Policy Press》编辑 Justin Hendrix 表示,特朗普政府的举动“很可能会引起外国政府对美国 AI 在关键应用中可靠性的警惕”。其传达的信息是:美国 AI 公司在没有美国政府干预的情况下,无法被信任能够正常运营。

The Trump administration hasn’t confirmed why it invoked its export control directive. Did the officials misread the report and freak out? Did Amazon CEO Andy Jassy say something to senior government officials that prompted the reaction, out of caution or spite? Was something lost in translation, or was this a way to pressure Anthropic, with whom the administration already has a fractious relationship? 特朗普政府尚未确认为何援引该出口管制指令。是官员们误读了报告而惊慌失措吗?是亚马逊 CEO Andy Jassy 出于谨慎或怨恨向政府高层说了什么而引发了这种反应吗?是沟通中出现了误解,还是这仅仅是向 Anthropic 施压的一种手段——毕竟政府与该公司本就关系紧张?

It’s possible that the White House was unaware of the far-reaching consequences of the letter’s demand and officials are scrambling to undo the damage of their own making. To quote Hendrix, “the climate is one of a cloud of suspicion that senior officials are picking favorites based on personal and political factors.” 白宫可能并未意识到该信函要求所带来的深远后果,官员们现在正忙于挽回他们自己造成的损失。引用 Hendrix 的话:“现在的氛围笼罩在一种怀疑的阴云中,即高级官员正在根据个人和政治因素挑选偏袒的对象。”

The aftermath is that the government has set a dangerous precedent about how much control it intends to wield over the release of American-made software. This time the government took issue with Anthropic; tomorrow it could be with anyone else. 其后果是,政府开创了一个危险的先例,即它打算对美国制造软件的发布行使多大的控制权。这次政府针对的是 Anthropic,明天可能就是任何其他人。