Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe’ Version for the Rest of You
Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe’ Version for the Rest of You
Anthropic 为网络安全合作伙伴提供 Mythos 升级版,并为普通用户推出“安全”版本
Anthropic released two new AI models called Claude Fable 5 and Claude Mythos 5 on Tuesday, which the company says have greater capabilities than the Mythos Preview model it released in April to a limited set of tech industry partners. Anthropic has said the initial, limited release stemmed from concerns that the model’s capabilities could be exploited by bad actors to develop hacking tools that could catch defenders off guard.
Anthropic 周二发布了两款名为 Claude Fable 5 和 Claude Mythos 5 的新 AI 模型。该公司表示,这两款模型比其四月份向少数科技行业合作伙伴发布的 Mythos 预览版(Preview)具备更强大的能力。Anthropic 此前曾表示,最初的有限发布是出于担忧,即该模型的能力可能被恶意行为者利用,开发出让防御者措手不及的黑客工具。
Anthropic is currently only releasing Claude Mythos 5 to a limited set of industry partners, many of which received access to Mythos Preview, and the company says it is collaborating with the US government on the rollout.
目前,Anthropic 仅向少数行业合作伙伴发布 Claude Mythos 5,其中许多合作伙伴此前已获得 Mythos 预览版的访问权限。该公司表示,正在与美国政府合作推进此次发布。
Claude Fable 5, which is being publicly released, uses the same underlying model as Mythos 5, but will have “guardrails” in place at launch, the company said Tuesday, that will block the model from answering many user questions related to cybersecurity, biology, and chemistry. These requests will instead be rerouted to an older AI model, Claude Opus 4.8. If Anthropic suspects a user is trying to conduct distillation—training a smaller AI model off a larger AI model’s responses—on Claude Fable 5, those requests will also be rerouted to Claude Opus 4.8, the company says.
Claude Fable 5 将面向公众发布,它使用与 Mythos 5 相同的底层模型,但公司周二表示,该模型在发布时将设有“护栏”,以阻止其回答许多与网络安全、生物学和化学相关的用户问题。这些请求将被重定向至较旧的 AI 模型 Claude Opus 4.8。Anthropic 表示,如果怀疑用户试图对 Claude Fable 5 进行“蒸馏”(即利用大型 AI 模型的响应来训练较小的 AI 模型),这些请求同样会被重定向至 Claude Opus 4.8。
In an interview with WIRED, Anthropic’s head of product management, Diane Penn, says that the company has been grappling with the question of how to handle Mythos’ software vulnerability-discovery abilities and other advanced capabilities since before its April release, but that testing and user input since then helped to hone the strategy.
在接受《连线》(WIRED)杂志采访时,Anthropic 产品管理负责人 Diane Penn 表示,自四月份发布之前,公司就一直在努力解决如何处理 Mythos 的软件漏洞发现能力及其他高级功能的问题,而此后的测试和用户反馈帮助公司完善了这一策略。
“We’re trying to make improvements in a way that’s beneficial, even if we don’t have the perfect [solution] for every use case to start,” Penn says. “Out of all the different approaches, this emerged as the most viable and the best one. We just ended up feeling like this was the best product choice for users to get the maximum value out of Fable 5.”
“我们正试图以一种有益的方式进行改进,即使我们一开始并没有针对每个用例都提供完美的解决方案,”Penn 说道,“在所有不同的方法中,这是目前最可行、也是最好的一种。我们最终认为,这是让用户从 Fable 5 中获得最大价值的最佳产品选择。”
For now, Penn says that the protective mechanism is built to err on the side of caution, meaning some user queries may be routed to the less capable AI model even if they’re benign. Over time, Anthropic hopes to make its classifiers more precise, but Penn says this was the only safe way the company could release the model broadly at this time.
Penn 表示,目前的保护机制设计倾向于谨慎,这意味着即使是良性的用户查询,有时也可能被重定向到能力较弱的 AI 模型。Anthropic 希望随着时间的推移,能使其分类器更加精确,但 Penn 指出,这是目前公司能够广泛发布该模型的唯一安全途径。
The company said on Tuesday that in addition to offering Claude Mythos 5 to Project Glasswing partners, it is also giving access to “select biology researchers.” Additionally, Anthropic noted in its blog post about Tuesday’s launch that it is providing unrestricted versions to these small groups of customers “until our trusted access program is available,” hinting at future plans to expand access even more. Since the Mythos launch in April, Anthropic has repeatedly emphasized that eventually its competitors in both the private and even open weight spaces will inevitably also offer models with Mythos-level capabilities.
该公司周二表示,除了向“Project Glasswing”合作伙伴提供 Claude Mythos 5 外,还将向“精选生物学研究人员”开放访问权限。此外,Anthropic 在关于周二发布的博客文章中提到,在“我们的可信访问计划上线之前”,它将向这些小群体客户提供无限制版本,这暗示了未来进一步扩大访问范围的计划。自四月份 Mythos 发布以来,Anthropic 一再强调,其在私有领域甚至开源领域的竞争对手最终也必然会提供具备 Mythos 级别能力的模型。
The ability for Claude Mythos and other new AI models to design hacking tools that can find and exploit vulnerabilities in both new and legacy software has forced tech companies and governments around the world to secure their software defenses before AI models of this level are made broadly available to attackers. Anthropic first released Mythos to industry partners under a consortium called Project Glasswing, with the idea that this could give members a head start in preparing their own systems and weighing global solutions to the threat before a broader release.
Claude Mythos 及其他新 AI 模型能够设计出发现并利用新旧软件漏洞的黑客工具,这迫使全球科技公司和政府在这些级别的 AI 模型广泛提供给攻击者之前,必须加强其软件防御。Anthropic 最初通过名为“Project Glasswing”的联盟向行业合作伙伴发布了 Mythos,其初衷是让成员们在更广泛发布之前,能够抢先准备好各自的系统,并权衡应对这一威胁的全球性解决方案。
Anthropic wrote in an update about Project Glasswing last week: “We’re working as quickly as we can to safely release Mythos-level capabilities in general access. To do so, we’ll need highly robust safeguards that prevent the model’s cyber capabilities from being misused—safeguards that we (and, to our knowledge, all other AI developers) have yet to develop.”
Anthropic 在上周关于 Project Glasswing 的更新中写道:“我们正尽可能快地工作,以安全地向公众发布 Mythos 级别的能力。为此,我们需要极其强大的保障措施,防止模型的网络能力被滥用——而这些保障措施,我们(据我们所知,所有其他 AI 开发商也是如此)尚未开发出来。”
Anthropic says Claude Fable 5—named after the literary form, much like the company’s existing Haiku, Sonnet, and Opus models—offers increased performance on software engineering and tasks that require visual understanding. But that added performance comes at a price. Claude Fable 5 and Claude Mythos 5 will cost developers $10 per million input tokens and $50 per million output tokens—twice as much as Anthropic’s publicly available AI models but cheaper than Mythos Preview.
Anthropic 表示,Claude Fable 5(与公司现有的 Haiku、Sonnet 和 Opus 模型一样,以文学体裁命名)在软件工程和需要视觉理解的任务上提供了更高的性能。但这种性能提升是有代价的。Claude Fable 5 和 Claude Mythos 5 的价格为每百万输入 token 10 美元,每百万输出 token 50 美元——这是 Anthropic 公开 AI 模型价格的两倍,但比 Mythos 预览版便宜。
The neutered release of Claude Fable 5 hints at Anthropic’s business tension of wanting to release a Mythos-class AI model for general use before the tech industry has resolved the cybersecurity concerns of these models. In April, OpenAI also privately launched a model that it said has advanced cybersecurity capabilities and convened a working group similar to Project Glasswing. Both OpenAI and Anthropic have confidentially filed for IPOs and are racing to impress prospective investors before they become public companies as soon as they can this year.
Claude Fable 5 的“阉割版”发布,暗示了 Anthropic 在商业上的两难境地:在科技行业尚未解决此类模型的网络安全担忧之前,它既想发布 Mythos 级别的 AI 模型供通用,又不得不进行限制。四月份,OpenAI 也私下发布了一款声称具备高级网络安全能力的模型,并召集了一个类似于 Project Glasswing 的工作组。OpenAI 和 Anthropic 都已秘密提交了 IPO 申请,并竞相在今年尽快上市之前给潜在投资者留下深刻印象。
Even as an interim solution, though, it remains to be seen how resistant Claude Fable 5’s safeguards are in the wild. Anthropic says in more than 1,000 hours of red-teaming, its testers found no universal jailbreaks for the model. Still, fears about the ability to develop adequate protections underpinned the company’s original justification for why it did not release Mythos-class models to the public in April, and these fears have seemingly persisted.
尽管这只是一个过渡性解决方案,但 Claude Fable 5 的防护措施在实际应用中究竟有多强的抗干扰能力,仍有待观察。Anthropic 表示,在超过 1,000 小时的红队测试中,其测试人员没有发现该模型的通用越狱方法。尽管如此,对于能否开发出足够保护措施的担忧,依然是该公司四月份未向公众发布 Mythos 级别模型的最初理由,而这些担忧似乎依然存在。