Claude Opus 4.8

Claude Opus 4.8

Introducing Claude Opus 4.8 May 28, 2026 Claude Opus 4.8 发布 2026年5月28日

We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator. It’s available today for the same price. 我们将 Claude Opus 升级到了新版本:Claude Opus 4.8。它在 Opus 4.7 的基础上进行了全方位的基准测试提升,是一个更高效的协作伙伴。该版本即日起上线,价格保持不变。

Opus 4.8 launches alongside several new features. Users on claude.ai now have control over the amount of effort Claude puts into a task. Claude Code has a new “dynamic workflows” feature that allows it to tackle very large-scale problems. And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models. Opus 4.8 随多项新功能一同发布。claude.ai 的用户现在可以控制 Claude 在任务中投入的精力。Claude Code 新增了“动态工作流”功能,使其能够处理超大规模的问题。此外,Opus 4.8 的快速模式(运行速度可达 2.5 倍)现在的成本比之前版本降低了三倍。

Opus 4.8’s capabilities The table below shows how Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks. More details and a much wider range of capability evaluations are provided in the Claude Opus 4.8 System Card. Opus 4.8 的能力 下表展示了 Opus 4.8 在编码、智能体技能、推理和实际知识工作任务测试中,与前代产品及其他模型的对比情况。更多细节及更广泛的能力评估请参阅《Claude Opus 4.8 系统卡》。

Collaborating with Opus 4.8 Early testers have found Claude Opus 4.8 to be more reliable and sharper in its judgement when it’s performing agentic tasks. Below are quotes from many of these testers about their experience collaborating with Opus 4.8: 与 Opus 4.8 协作 早期测试人员发现,Claude Opus 4.8 在执行智能体任务时更加可靠,判断力也更敏锐。以下是多位测试人员关于与 Opus 4.8 协作体验的评价:

“Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with.” “Claude Opus 4.8 的判断力明显提升。在 Claude Code 中,它能提出正确的问题,发现自己的错误,在方案不合理时提出异议,并在进行重大更改前,针对复杂的多服务探索建立信心。这是一个非常适合开发的模型。”

“On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability.” “在我们的超级智能体基准测试中,Claude Opus 4.8 是唯一能端到端完成所有案例的模型,在成本相当的情况下击败了之前的 Opus 模型和 GPT-5.5。对于翻译、深度研究、幻灯片制作和分析等智能体产品,它提供了强大的可靠性。”

“On CursorBench, Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through.” “在 CursorBench 上,Claude Opus 4.8 在各个努力程度等级上都超越了之前的 Opus 模型。工具调用效率显著提高,以更少的步骤实现了同样的智能水平,并能出色地完成端到端任务。”

“Claude Opus 4.8 delivers the highest score recorded on our Legal Agent Benchmark, and is the first model to break 10% overall on the all-pass standard. For substantive legal work, that’s the kind of accuracy lift that translates directly into how much real attorney work our customers can hand off with confidence.” “Claude Opus 4.8 在我们的法律智能体基准测试中获得了最高分,也是第一个在全通过标准下总体突破 10% 的模型。对于实质性的法律工作,这种准确性的提升直接转化为客户可以放心地将多少实际律师工作外包给 AI。”

“Claude Opus 4.8 feels like a major quality-of-life update over Opus 4.7: faster, easier to collaborate with, and better at carrying context and style direction across a long session. Opus 4.8 is the model I kept trusting for work where voice, taste, and technical execution all have to happen side-by-side.” “Claude Opus 4.8 感觉像是对 Opus 4.7 的一次重大体验升级:速度更快,协作更轻松,并且在长会话中保持上下文和风格导向的能力更强。对于那些需要兼顾语调、品味和技术执行的工作,Opus 4.8 是我一直信赖的模型。”

“Claude Opus 4.8 is the strongest computer-use and browser-agent model we’ve tested, scoring 84% on Online-Mind2Web, which is a meaningful jump over both Opus 4.7 and GPT-5.5. It stays reflective and on-task in the way our customers’ agent workloads need to be reliable end-to-end.” “Claude Opus 4.8 是我们测试过的最强大的计算机使用和浏览器智能体模型,在 Online-Mind2Web 上得分 84%,这比 Opus 4.7 和 GPT-5.5 都有显著提升。它保持了反思能力和专注度,满足了我们客户对智能体工作负载端到端可靠性的需求。”

“Claude Opus 4.8 uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need to keep running unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7. This release from Anthropic translates directly into faster capability gains for engineers building on Devin.” “Claude Opus 4.8 能简洁地使用工具,并以我们自主工程工作负载无人值守运行所需的一致性来遵循指令。它在 Opus 4.6 的基础上进行了改进,修复了我们在 Opus 4.7 中看到的注释冗长和工具调用问题。Anthropic 的这次发布直接转化为在 Devin 上进行开发的工程师们能力的快速提升。”

“On our long-running evals, Claude Opus 4.8’s analysis was consistently higher quality than prior Opus models. It finished faster and produced richer, more information dense outputs. Overall, a noticeably better signal to noise ratio. The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch.” “在我们长期的评估中,Claude Opus 4.8 的分析质量始终高于之前的 Opus 模型。它完成速度更快,产出的内容更丰富、信息密度更高。总体而言,信噪比明显更好。最大的区别在于 Opus 4.8 会主动标记分析输入和输出中的问题,而其他模型通常会忽略这一点,留给用户去发现。”

“Across CoCounsel Legal, Claude Opus 4.8 delivered meaningful improvements in consistency and reasoning quality compared to prior Opus models. For the high-stakes professional workflows our customers depend on, that reliability matters. As we build fiduciary-grade AI systems for legal and tax professionals, advances like these help raise the standard for trusted AI performance in real-world workflows.” “在 CoCounsel Legal 中,与之前的 Opus 模型相比,Claude Opus 4.8 在一致性和推理质量上带来了显著提升。对于客户依赖的高风险专业工作流,这种可靠性至关重要。随着我们为法律和税务专业人士构建受托人级别的 AI 系统,这些进步有助于提高现实工作流中可信 AI 性能的标准。”

“Claude Opus 4.8 sets a new bar for enterprise AI. In Genie, Databricks’ AI agent for data and knowledge work, the new Opus model unlocks a step change in agentic reasoning, tackling deeper, multistep questions faster than any prior Opus. Its multimodal strength also lets Genie reason directly over PDFs, diagrams, and other unstructured content at 61% cheaper token cost than Opus 4.7.” “Claude Opus 4.8 为企业级 AI 树立了新标杆。在 Databricks 用于数据和知识工作的 AI 智能体 Genie 中,新的 Opus 模型实现了智能体推理的跨越式发展,处理更深层次、多步骤问题的速度比以往任何 Opus 模型都快。其多模态能力还使 Genie 能够直接对 PDF、图表和其他非结构化内容进行推理,且 Token 成本比 Opus 4.7 降低了 61%。”

“For financial-document workflows in Hebbia’s orchestrator, Claude Opus 4.8 delivers the same strong quality as Opus 4.7 with noticeably better citation precision and more token efficiency on retrieval, which works incredibly well for the kinds of dense filings our customers run every day.” “对于 Hebbia 编排器中的金融文档工作流,Claude Opus 4.8 提供了与 Opus 4.7 同样强大的质量,同时在引用精度和检索 Token 效率上有了显著提升,这对于我们客户每天处理的密集型文件非常有效。”

One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to be honest—for instance, to avoid making claims that they can’t support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. Opus 4.8 最显著的改进之一是其诚实性。我们训练所有模型保持诚实——例如,避免做出无法支持的断言。但 AI 模型的一个普遍问题是,它们有时会草率下结论,在证据不足的情况下自信地声称工作取得了进展。早期测试人员报告称,Opus 4.8 更倾向于标记其工作中的不确定性,且不太可能做出未经证实的声明。这一点在我们的评估中得到了证实:Opus 4.8 允许其编写的代码中存在缺陷而不加说明的可能性,比前代产品降低了约四倍。

As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card. 一如既往,我们在发布前对模型进行了详细的对齐评估。在积极特质方面,我们的对齐团队得出结论:Opus 4.8 “在支持用户自主性和维护用户最大利益等亲社会特质的衡量指标上达到了新高度。”评估还显示,Opus 4.8 的未对齐行为(如欺骗或配合滥用)发生率远低于 Opus 4.7,与我们对齐效果最好的模型 Claude Mythos Preview 相当。完整的对齐评估及一系列部署前安全测试报告已收录在《Claude Opus 4.8 系统卡》中。