Anthropic apologizes for invisible Claude Fable guardrails

Anthropic apologizes for invisible Claude Fable guardrails

Anthropic 就 Claude Fable 的隐形护栏致歉

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. The company says it is reversing course and will be more transparent about when the restrictions kick in, even if that means Fable refuses more queries.

Anthropic 公司近日就其秘密限制新款 AI 模型 Claude Fable 5 的行为致歉。此前,该公司通过隐形护栏限制了该模型,此举损害了研究人员及试图利用该模型开发竞争系统的对手的利益。Anthropic 表示将改变策略,在触发限制时提高透明度,即便这意味着 Fable 会拒绝更多的查询请求。

Fable is the first widely available model in Anthropic’s Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. Anthropic says it has addressed some of those risks by launching Fable with safeguards that prevent it from responding to certain “high-risk” queries. One of the areas Anthropic said it would restrict Fable’s responses is distillation, a technique for training smaller AI models using the outputs of larger ones.

Fable 是 Anthropic “Mythos” 系列 AI 系统中首个广泛发布的产品,该公司此前曾花费数月时间警告称该系列模型过于危险,不宜公开发布。Anthropic 表示,通过在 Fable 中加入防止其响应某些“高风险”查询的护栏,公司已经解决了一部分风险。Anthropic 指出,其限制 Fable 响应的领域之一是“蒸馏”(distillation),这是一种利用大型模型输出来训练小型 AI 模型的技术。

In Fable’s system card — a public document AI developers release to explain how a system works — Anthropic said it would handle queries it believed were distillation attempts by altering and degrading the model’s answers directly. Users would not be notified that they had triggered the safety measure or informed that the responses had been changed.

在 Fable 的系统卡片(AI 开发者用于解释系统工作原理的公开文档)中,Anthropic 曾表示,对于其认为属于“蒸馏”尝试的查询,公司将通过直接修改和降低模型回答质量的方式进行处理。用户既不会收到触发安全措施的通知,也不会被告知回答内容已被篡改。

Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X. Anthropic will prominently tell users too: “You will see this every time it happens.”

Anthropic 表示,目前正在改变处理蒸馏请求的方式:该公司在 X 平台上发文称,相关查询现在将回退至 Anthropic 此前的旗舰模型 Claude Opus 4.8。Anthropic 还将明确告知用户:“每次发生这种情况时,你都会看到提示。”

This is similar to how Fable handles queries in other high-risk areas. When safety features are triggered in areas like biology, chemistry, and cybersecurity, queries are routed through Opus 4.8 unless they are blocked outright under the company’s broader safety rules, such as those covering drugs, weapons, or other prohibited content. In some cases, notably biology, the safeguards have been calibrated so broadly that Fable is practically unusable for even basic queries, something Anthropic acknowledged in a comment to The Verge.

这与 Fable 处理其他高风险领域查询的方式类似。当生物学、化学和网络安全等领域的安全功能被触发时,查询请求会被路由至 Opus 4.8,除非这些请求因违反公司更广泛的安全规则(如涉及毒品、武器或其他违禁内容)而被直接拦截。在某些情况下,尤其是生物学领域,护栏的校准范围过于宽泛,导致 Fable 甚至无法处理基础查询。Anthropic 在回应《The Verge》的评论时承认了这一点。

“Visible safeguards can be probed, so they have to be robust, which takes time to get right,” Anthropic wrote. “Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.”

“可见的护栏可以被探测,因此必须足够稳健,这需要时间来完善,”Anthropic 写道。“隐形护栏可以进行更精准的定向,使我们能够快速发布产品并减少误报。我们正是出于这个原因选择了隐形护栏,但这是一个错误的权衡。你们应该了解我们设置了哪些护栏以及原因。我们很抱歉没能把握好其中的平衡。”

The change follows intense backlash from the AI research community over Anthropic’s decision to silently limit users suspected of trying to distill Fable into competing models — a safeguard critics warned could also affect third parties trying to evaluate the frontier model. In the system card, Anthropic said newer models’ ability to accelerate AI development justified targeting those requests, noting that “using Claude to develop competing models already violates our Terms of Service.” Anthropic has previously accused Chinese rivals like DeepSeek of unfairly distilling its models on an “industrial” scale.

此次调整是在 AI 研究界强烈反对 Anthropic 的决定后做出的。此前,Anthropic 秘密限制了那些被怀疑试图将 Fable 蒸馏为竞争模型的用户,批评者警告称,这种护栏可能会影响试图评估该前沿模型的第三方。在系统卡片中,Anthropic 曾表示,新模型加速 AI 开发的能力证明了针对此类请求进行限制的合理性,并指出“使用 Claude 开发竞争模型已违反了我们的服务条款”。Anthropic 此前曾指责 DeepSeek 等中国竞争对手以“工业级”规模不正当地蒸馏其模型。