An update on our election safeguards

An update on our election safeguards

关于我们选举安全保障措施的更新

Apr 24, 2026 2026年4月24日

People around the world turn to Claude for information about political parties, candidates, and the issues at stake during election time—as well as to answer simpler questions like when, where, and how to vote. In our view, if AI models can answer these questions well (that is, accurately and impartially), they can be a positive force for the democratic process. Here, we explain what we’re doing to help Claude meet the mark ahead of the US midterms and other major elections around the world this year. 世界各地的人们在选举期间会向 Claude 寻求有关政党、候选人和关键议题的信息,也会询问诸如何时、何地以及如何投票等简单问题。我们认为,如果人工智能模型能够很好地(即准确且公正地)回答这些问题,它们就能成为民主进程的积极力量。在此,我们将解释我们正在采取哪些措施,以帮助 Claude 在美国中期选举及今年全球其他重大选举到来之前达到这一标准。

Measuring and preventing political bias

衡量并防止政治偏见

When people ask Claude about political topics, they should get comprehensive, accurate, and balanced responses—responses that help them reach their own conclusions rather than steer them toward a particular viewpoint. That’s why we train Claude to treat different political viewpoints with equal depth, engagement, and analytical rigor—a principle set out in Claude’s constitution. This is built into the model through character training (where we reward the model for producing responses that reflect a set of values and traits), and then reinforced through our system prompts, which carry explicit instructions on political neutrality into every conversation on Claude.ai. (You can read more about this process in our previous post about political bias.) 当人们向 Claude 询问政治话题时,他们应该得到全面、准确且平衡的回答——这些回答应有助于他们得出自己的结论,而不是引导他们偏向某种特定观点。正因如此,我们训练 Claude 以同等的深度、参与度和分析严谨性来对待不同的政治观点,这一原则已写入 Claude 的“宪法”中。这通过性格训练(我们奖励模型生成反映特定价值观和特质的回答)内置于模型中,并通过我们的系统提示词进一步强化,这些提示词将关于政治中立的明确指令带入 Claude.ai 的每一次对话中。(您可以在我们之前关于政治偏见的文章中阅读有关此过程的更多信息。)

Before each model launch, we run evaluations to measure how consistently, thoughtfully, and impartially Claude engages with prompts that express views from across the political spectrum. For example, a model that writes a lengthy response defending one position but offers only a single sentence for the opposing one would score poorly. Here, Opus 4.7 and Sonnet 4.6 scored 95% and 96%, respectively. We’ve published our evaluation methodology and open-source dataset here, so that others can replicate or iterate upon our work. 在每次模型发布之前,我们都会进行评估,以衡量 Claude 在处理表达不同政治光谱观点的提示词时,表现得多么连贯、深思熟虑且公正。例如,如果一个模型用长篇大论为一种立场辩护,却只用一句话回应相反立场,那么它的得分会很低。在此次评估中,Opus 4.7 和 Sonnet 4.6 分别获得了 95% 和 96% 的得分。我们已在此处发布了我们的评估方法和开源数据集,以便其他人可以复制或迭代我们的工作。

We also welcome feedback and input from third parties and industry experts. We’re currently working with The Future of Free Speech (an independent think tank at Vanderbilt University), the Foundation for American Innovation, and the Collective Intelligence Project on a broader review of model behaviors around freedom of expression, including political conversations. 我们也欢迎来自第三方和行业专家的反馈与建议。目前,我们正与“言论自由未来”(Future of Free Speech,范德比尔特大学的一个独立智库)、美国创新基金会(Foundation for American Innovation)以及集体智慧项目(Collective Intelligence Project)合作,对模型在言论自由(包括政治对话)方面的行为进行更广泛的审查。

Enforcing policies and testing our defenses

执行政策与测试防御机制

Our Usage Policy sets clear rules on the use of Claude around elections. Claude can’t be used to run deceptive political campaigns, create fake digital content to influence political discourse, commit voter fraud, interfere with voting systems, or spread misleading information about voting processes. 我们的使用政策对选举期间 Claude 的使用设定了明确规则。Claude 不得用于开展欺骗性政治竞选、创建虚假数字内容以影响政治舆论、实施选民欺诈、干扰投票系统或传播有关投票流程的误导性信息。

These policies are backed by robust detection and enforcement. We use automated classifiers to detect signs of potential violations, and we have a dedicated threat intelligence team that investigates and disrupts coordinated abuse efforts. Together, they form an always-on first line of defense—allowing our enforcement to focus on actual misuse without hindering the millions of ordinary conversations happening every day. 这些政策由强大的检测和执行机制作为后盾。我们使用自动化分类器来检测潜在违规迹象,并拥有一个专门的威胁情报团队来调查和阻断协同滥用行为。它们共同构成了全天候的第一道防线,使我们的执法能够专注于实际的滥用行为,而不会妨碍每天发生的数百万次普通对话。

To measure how well Claude handles election-related risks, we run a series of tests examining its responses to questions about candidates, voting, and election administration, and how it holds up against attempts at misuse. We first wrote about this approach in 2024. Our latest tests use 600 prompts to assess how well Claude follows our election-related Usage Policy, based on how people actually talk to Claude about elections. They consist of 300 harmful requests (such as attempts to have Claude generate election misinformation) paired with 300 legitimate requests (such as creating campaign content or civic engagement resources). We assess how well Claude complies with the legitimate requests and declines the harmful ones. Claude Opus 4.7 and Claude Sonnet 4.6 responded appropriately 100% and 99.8% of the time, respectively. 为了衡量 Claude 处理选举相关风险的能力,我们进行了一系列测试,检查其对有关候选人、投票和选举管理问题的回答,以及它在面对滥用企图时的表现。我们于 2024 年首次介绍了这种方法。我们最新的测试使用了 600 个提示词,基于人们实际与 Claude 讨论选举的方式,来评估 Claude 对我们选举相关使用政策的遵循程度。测试包含 300 个有害请求(例如试图让 Claude 生成选举虚假信息)和 300 个合法请求(例如创建竞选内容或公民参与资源)。我们评估 Claude 在多大程度上遵守了合法请求并拒绝了有害请求。Claude Opus 4.7 和 Claude Sonnet 4.6 的适当响应率分别为 100% 和 99.8%。

We also test how well Claude holds up against influence operations: coordinated efforts to manipulate public opinion or political outcomes through fake personas, fabricated content, or deceptive amplification. To do this, we use multi-turn simulated conversations that mirror the step-by-step tactics bad actors might use. In our latest evaluations, Sonnet 4.6 and Opus 4.7 both responded appropriately 90% and 94% of the time. Once deployed, these models run with additional monitoring and our system prompt to help further reduce the risk of election-related abuse. 我们还测试了 Claude 对抗“影响力行动”的能力:即通过虚假身份、伪造内容或欺骗性放大来操纵舆论或政治结果的协同努力。为此,我们使用了模拟多轮对话,重现了恶意行为者可能使用的分步策略。在最新的评估中,Sonnet 4.6 和 Opus 4.7 的适当响应率分别为 90% 和 94%。一旦部署,这些模型将配合额外的监控和我们的系统提示词运行,以帮助进一步降低选举相关滥用的风险。

Ahead of launching Mythos Preview and Opus 4.7, we tested for the first time whether models can carry out influence operations autonomously—planning and running a multi-step campaign end-to-end without human prompting. With safeguards and training in place, our latest models refused nearly every task. Without our safeguards in place (which we do to measure a model’s raw capabilities), only Mythos Preview and Opus 4.7 completed more than half the tasks. While these models would still require substantial human direction, the results underscore the need for continued vigilance. We’ll keep running and refining these evaluations, and implement improvements as needed. 在推出 Mythos Preview 和 Opus 4.7 之前,我们首次测试了模型是否能够自主开展影响力行动——即在没有人类提示的情况下,端到端地规划和运行多步骤竞选活动。在有安全保障和训练的情况下,我们最新的模型拒绝了几乎所有此类任务。在没有安全保障的情况下(我们这样做是为了衡量模型的原始能力),只有 Mythos Preview 和 Opus 4.7 完成了超过一半的任务。虽然这些模型仍然需要大量的人工指导,但结果强调了持续保持警惕的必要性。我们将继续运行和完善这些评估,并根据需要实施改进。

Sharing reliable election resources

分享可靠的选举资源

When people come to Claude for information, we want Claude to share the facts, and, when needed, point people to reliable and up-to-date resources. 当人们向 Claude 寻求信息时,我们希望 Claude 能分享事实,并在需要时引导人们获取可靠且最新的资源。

One way we help Claude do this is through election banners, which we first launched in 2024, ahead of major elections in the US and elsewhere around the world. When users ask about voter registration, polling locations, election dates, or ballot information on Claude.ai, Claude displays an election banner pointing them to trusted sources. In this year’s US midterm elections, our banner will direct users to TurboVote, a nonpartisan resource from Democracy Works that provides reliable, real-time information about those topics. We’ll implement a similar banner for Brazil’s elections later this year and will look to expand this feature to elections elsewhere in the future. 我们帮助 Claude 实现这一目标的方式之一是使用选举横幅,我们于 2024 年在美国及全球其他地区重大选举前首次推出了该功能。当用户在 Claude.ai 上询问选民登记、投票地点、选举日期或选票信息时,Claude 会显示一个选举横幅,引导他们访问可信来源。在今年的美国中期选举中,我们的横幅将引导用户访问 TurboVote,这是 Democracy Works 提供的一个无党派资源,提供有关上述主题的可靠、实时信息。我们将在今年晚些时候为巴西的选举实施类似的横幅,并计划在未来将此功能扩展到其他地区的选举中。