Claude’s new model is more ‘honest’ when it messes up

Claude’s new model is more ‘honest’ when it messes up

Claude 的新模型在出错时更加“诚实”

Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model’s “honesty.” Anthropic 将于周四发布 Claude Opus 4.8,该公司正在大力宣传该模型的“诚实性”。

According to Anthropic, it trains “all [its] models to be honest — for instance, to avoid making claims that they can’t support.” But it notes that “a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence.” 据 Anthropic 称,他们训练“所有模型保持诚实——例如,避免做出无法支持的断言”。但该公司指出,“人工智能模型的一个普遍问题是,它们有时会草率下结论,在证据不足的情况下自信地展示其工作进展。”

The AI lab claims that early testers have found that Opus 4.8 “is more likely to flag uncertainties about its work and less likely to make unsupported claims.” In the company’s evaluations, Opus 4.8 is “around 4x less likely than its predecessor to allow flaws in code it’s written to pass unremarked.” 该人工智能实验室声称,早期测试人员发现 Opus 4.8 “更有可能标记其工作中的不确定性,且不太可能做出未经证实的断言”。在公司的评估中,Opus 4.8 “忽略其所写代码中缺陷的可能性比前代产品低约 4 倍”。

In addition to the honesty improvements, with Opus 4.8, users can direct the amount of effort Claude puts into a task. Higher-effort responses will use more tokens, giving users the option of lower-effort responses if they don’t want to burn through their rate limits as quickly. 除了诚实性的提升外,通过 Opus 4.8,用户还可以指定 Claude 在任务中投入的精力。更高精力的回复将消耗更多 Token,如果用户不想过快耗尽速率限制,也可以选择低精力的回复。

Anthropic is also launching a feature called “dynamic workflows” in research preview, which the company says will let Claude “take on even bigger tasks.” With dynamic workflows, “Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user.” Anthropic 还推出了名为“动态工作流”(dynamic workflows)的研究预览功能,该公司表示这将使 Claude 能够“承担更大的任务”。通过动态工作流,“Claude 可以规划工作,然后在单次会话中运行数百个并行子代理(在 Opus 4.8 的支持下,这些代理可以运行更长时间)。随后,它会在向用户报告之前验证其输出结果。”