Anthropic launches Claude Sonnet 5 as a cheaper way to run agents
Anthropic launches Claude Sonnet 5 as a cheaper way to run agents
Anthropic 发布 Claude Sonnet 5,提供更具性价比的智能体运行方案
As shipping agentic capabilities becomes table stakes among foundation model companies, Anthropic is releasing Claude Sonnet 5, a more powerful and agentic version of the lab’s midsize model. “It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models,” Anthropic said in a blog post.
随着提供智能体(Agentic)能力成为各大基础模型公司的“入场券”,Anthropic 发布了 Claude Sonnet 5,这是该公司中等规模模型中功能更强大、更具智能体特性的版本。Anthropic 在博客文章中表示:“它能够制定计划、使用浏览器和终端等工具,并以几个月前还需要更大、更昂贵的模型才能达到的水平进行自主运行。”
That framing mirrors what OpenAI and Google have said about their own recent releases. OpenAI’s GPT-5.6 Sol was launched in preview last week, and it is also the firm’s most agentic model yet, allowing users to split work across subagents for longer autonomous tasks. Google’s Gemini 3.5 Flash, which launched in May, was pitched as a shift from a conversational chatbot to an agentic tool that plans, builds, and iterates on real work with minimal human input.
这种表述与 OpenAI 和 Google 近期发布产品时的口径如出一辙。OpenAI 上周推出了 GPT-5.6 Sol 的预览版,这也是该公司迄今为止最具备智能体特性的模型,允许用户将工作拆分给子智能体,以完成更长时间的自主任务。Google 于 5 月推出的 Gemini 3.5 Flash 则被定位为从对话式聊天机器人向智能体工具的转型,旨在通过极少的人工干预来规划、构建和迭代实际工作。
Sonnet 5’s pitch is confirmation that agentic capability is the new baseline expectation at every price tier. Now the differentiator isn’t going to be who can do agentic work best, but how cheaply they can do it and how reliably without human oversight. Sonnet 5 promises performance close to that of Opus 4.8, but for much lower costs.
Sonnet 5 的推出证实了智能体能力已成为各价格档位的基准预期。现在的竞争差异点不再是谁能把智能体工作做得最好,而是谁能以更低的成本、在无需人工监督的情况下更可靠地完成任务。Sonnet 5 承诺提供接近 Opus 4.8 的性能,但成本却大幅降低。
Starting Tuesday, Claude Sonnet 5 will be the default model for free and Pro plans and is available for every subscription. At launch, Sonnet 5 is priced at $2 per million input tokens and $10 per million output tokens through August 31, after which the price will jump to $3 per million input tokens and $15 per million output tokens. That makes Sonnet 5 cheaper than Opus 4.8, as well as OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro. (It’s still more expensive than Gemini 3.5 Flash.)
从周二开始,Claude Sonnet 5 将成为免费版和 Pro 版订阅的默认模型,并适用于所有订阅方案。发布初期,Sonnet 5 的定价为每百万输入 Token 2 美元,每百万输出 Token 10 美元(有效期至 8 月 31 日);此后价格将调整为每百万输入 Token 3 美元,每百万输出 Token 15 美元。这使得 Sonnet 5 比 Opus 4.8、OpenAI 的 GPT-5.5 以及 Google 的 Gemini 3.1 Pro 更具价格优势。(不过它仍比 Gemini 3.5 Flash 贵。)
The new model also demonstrates significant improvements over its predecessor Sonnet 4.6, released in February, on agentic performance like reasoning, tool use, software coding, and knowledge work, according to Anthropic. For example, on one benchmark, Sonnet 5 scores a 63.2% on agentic coding, compared to Opus 4.8’s 69.2% and Sonnet 4.6’s 58.1%. On a knowledge work benchmark, Sonnet 5 actually slightly outperforms Opus 4.8, which is known for winning on solving the hardest problems like making subtle judgment calls and deep research.
据 Anthropic 称,与 2 月发布的上一代 Sonnet 4.6 相比,新模型在推理、工具使用、软件编码和知识工作等智能体性能方面表现出显著提升。例如,在一项基准测试中,Sonnet 5 在智能体编码方面得分为 63.2%,而 Opus 4.8 为 69.2%,Sonnet 4.6 为 58.1%。在知识工作基准测试中,Sonnet 5 的表现甚至略微超过了以解决复杂问题(如细微判断和深度研究)见长的 Opus 4.8。
“Opus 4.8 is still the model of choice for higher accuracy on these tasks, but Sonnet 5 provides developers with lower-priced options that are of much higher quality than what was previously available,” Anthropic says. “Between Sonnet 5 and Opus 4.8, users can adjust the effort level to find the right balance of cost and performance.”
Anthropic 表示:“对于这些任务,Opus 4.8 仍然是追求更高准确性的首选模型,但 Sonnet 5 为开发者提供了价格更低且质量远高于以往的选择。在 Sonnet 5 和 Opus 4.8 之间,用户可以根据需求调整投入,找到成本与性能之间的最佳平衡点。”
According to testers cited in the blog post, Sonnet 5 also excels at finishing complex tasks where previous model versions would have stopped short and “checks its own output without explicitly being asked.” “We handed Claude Sonnet 5 a two-part job — update Salesforce account tiers, send a launch announcement to enterprise contacts — and it finished end to end,” Daniel Shepard, a senior engineer at Zapier, said in a statement. “That used to stall halfway. For day-to-day automation, it’s a no-brainer.”
根据博客中引用的测试人员反馈,Sonnet 5 还擅长完成以前版本模型会中途停止的复杂任务,并且“无需明确要求即可检查自己的输出”。Zapier 的高级工程师 Daniel Shepard 在声明中表示:“我们交给 Claude Sonnet 5 一项两阶段任务——更新 Salesforce 账户等级并向企业联系人发送发布公告——它全程完成了。以前这通常会卡在半路。对于日常自动化工作来说,选择它是不二之选。”
On safety, Sonnet 5 also demonstrates a lower rate of “undesirable behaviors” like cooperation with misuse and deception than its predecessor, making it safer to use in agentic contexts. It’s better at refusing malicious requests and sidestepping hijack attempts in prompt-injection attacks. It also hallucinates and engages in sycophantic behavior at a lower rate than Sonnet 4.6. That said, it’s not on the same level as Opus 4.8 and Claude Mythos Preview when it comes to misaligned behavior. “Evaluations also show that it has a much lower ability to perform dangerous cybersecurity tasks than our current Opus models,” reads the blog post.
在安全性方面,Sonnet 5 表现出的“不良行为”(如配合滥用和欺骗)比上一代更少,使其在智能体场景中使用更安全。它在拒绝恶意请求和规避提示词注入攻击方面的能力更强。此外,它的幻觉率和迎合性行为也低于 Sonnet 4.6。不过,在应对失控行为方面,它尚未达到 Opus 4.8 和 Claude Mythos Preview 的水平。博客文章写道:“评估还显示,它执行危险网络安全任务的能力远低于我们目前的 Opus 模型。”
Lovable co-founder Fabian Hedin said in a statement that Claude Sonnet 5 “refuses unsafe requests cleanly and consistently.” “At Lovable, we’re putting powerful tools in the hands of millions of builders,” Hedin said. “A model that knows when to say no is just as important as one that knows how to build.”
Lovable 联合创始人 Fabian Hedin 在声明中表示,Claude Sonnet 5 “能够干净利落地持续拒绝不安全请求”。Hedin 说:“在 Lovable,我们正将强大的工具交到数百万开发者手中。一个知道何时说‘不’的模型,与一个知道如何构建的模型同样重要。”
Updated to correct that the price of output tokens is $15 per million output tokens after August 31. 更新说明:修正了 8 月 31 日后输出 Token 的价格为每百万 15 美元。