Our commitment to community safety
Our commitment to community safety
我们对社区安全的承诺
Mass shootings, threats against public officials, bombing attempts, and attacks on communities and individuals are an unacceptable and grave reality in today’s world. These incidents are a reminder of how real the threat of violence is—and how quickly violent intent can move from words to action. People may also bring these moments and feelings into ChatGPT. They may ask questions about the news, try to understand what happened, express fear or anger, or talk about violence in ways that are fictional, historical, political, personal, or potentially dangerous. We work to train ChatGPT to recognize the difference—and to draw lines when a conversation starts to move toward threats, potential harm to others, or real-world planning.
大规模枪击事件、针对公职人员的威胁、爆炸企图以及针对社区和个人的袭击,是当今世界不可接受且严峻的现实。这些事件提醒我们,暴力的威胁是多么真实,而暴力意图从言语转化为行动又是多么迅速。人们也可能将这些时刻和情绪带入 ChatGPT。他们可能会询问新闻、试图了解发生了什么、表达恐惧或愤怒,或者以虚构、历史、政治、个人或潜在危险的方式谈论暴力。我们致力于训练 ChatGPT 识别其中的差异,并在对话开始转向威胁、对他人的潜在伤害或现实世界的策划时划定界限。
We’re sharing what we do to minimize uses of our services in furtherance of violence or other harm: how our models are trained to respond safely, how our systems detect potential risk of harm, and what actions we take when someone violates our policies. We are constantly improving the steps we take to help protect people and communities, guided by input from psychologists, psychiatrists, civil liberties and law enforcement experts, and others who help us navigate difficult decisions around safety, privacy, and democratized access.
我们在此分享我们为减少服务被用于助长暴力或其他伤害所采取的措施:我们的模型如何被训练以安全地响应、我们的系统如何检测潜在的伤害风险,以及当有人违反我们的政策时我们采取什么行动。在心理学家、精神科医生、公民自由和执法专家以及其他帮助我们处理有关安全、隐私和民主化访问等困难决策的人士的指导下,我们不断改进保护人员和社区的措施。
How we mitigate risks of harm in ChatGPT.
我们如何降低 ChatGPT 中的伤害风险。
Our Model Spec lays out our long-standing principles for how we want our models to behave: maximizing helpfulness and user freedom while minimizing the risk of harm through sensible defaults. We work to train our models to refuse requests for instructions, tactics, or planning that could meaningfully enable violence. At the same time, people may ask neutral questions about violence for factual, historical, educational, or preventive reasons, and we aim to allow those discussions while maintaining clear safety boundaries—for example, by omitting detailed, operational instructions that could facilitate harm. The line between benign and harmful uses can be subtle, so we continually refine our approach and work with experts to help distinguish between safe, bounded responses and actionable steps for carrying out violence or other real-world harm.
我们的《模型规范》(Model Spec) 阐述了我们长期以来对模型行为的原则:在通过合理的默认设置最大限度降低伤害风险的同时,最大化有用性和用户自由。我们致力于训练模型拒绝那些可能实质性助长暴力的指令、策略或策划请求。同时,人们可能会出于事实、历史、教育或预防的原因提出关于暴力的中性问题,我们旨在允许这些讨论,同时保持明确的安全界限——例如,省略可能导致伤害的详细操作说明。良性使用与有害使用之间的界限可能很微妙,因此我们不断完善我们的方法,并与专家合作,以帮助区分安全的、受限的回答与实施暴力或其他现实伤害的可操作步骤。
As part of this ongoing work, we’ve continued expanding our safeguards to help ChatGPT better recognize subtle signs of risk of harm across different contexts. Some safety risks only become clear over time: a single message may seem harmless on its own, but a broader pattern within a long conversation—or across conversations—can suggest something more concerning. Building on years of work in model training, evaluations and red teaming, and ongoing expert input, we have strengthened how ChatGPT recognizes subtle warning signs across long, high-stakes conversations and carefully responds. We’ll share more about this work in the coming weeks.
作为这项持续工作的一部分,我们继续扩大我们的保障措施,以帮助 ChatGPT 在不同情境下更好地识别潜在的伤害风险信号。一些安全风险只有随着时间的推移才会显现:单条信息本身可能看起来无害,但长对话中——或跨对话的——更广泛的模式可能暗示着更令人担忧的情况。基于多年在模型训练、评估和红队测试方面的积累,以及持续的专家投入,我们加强了 ChatGPT 在长篇、高风险对话中识别微妙警告信号并谨慎响应的能力。我们将在未来几周分享更多关于这项工作的信息。
Our safety work also extends to situations where users may be in distress or at risk of self-harm. In these moments, our goal is to avoid facilitating harmful acts, and also to help de-escalate the situation and guide people to real-world support. ChatGPT surfaces localized crisis resources, encourages people to reach out to mental health professionals or trusted loved ones, and in the most serious cases directs people to seek emergency help.
我们的安全工作还延伸到用户可能处于痛苦中或有自残风险的情况。在这些时刻,我们的目标是避免助长有害行为,同时帮助缓解局势,并引导人们寻求现实世界的支持。ChatGPT 会提供本地化的危机资源,鼓励人们联系心理健康专业人士或值得信赖的亲人,并在最严重的情况下引导人们寻求紧急救助。
How we monitor and enforce our rules.
我们如何监控和执行我们的规则。
We assume the best of our users, but when we detect that someone is attempting to use our tools to potentially plan or carry out violence, we take action, including revoking access to OpenAI’s services. Our Usage Policies set clear expectations for acceptable use and that we may prohibit use for threats, intimidation, harassment, terrorism or violence, weapons development, illicit activity, destruction of property or systems, and attempts to circumvent our safeguards. We take those policies seriously and work hard to enforce them. We use automated detection systems to identify potentially concerning activity at scale. These systems analyze user content and behavior using a range of tools designed to identify signals that may indicate policy violations or harmful activity, including classifiers, reasoning models, hash-matching technologies, blocklists, and other monitoring systems.
我们对用户抱有最好的期待,但当我们检测到有人试图利用我们的工具来策划或实施暴力时,我们会采取行动,包括撤销其对 OpenAI 服务的访问权限。我们的《使用政策》(Usage Policies) 对可接受的使用设定了明确的期望,并规定我们可能会禁止将服务用于威胁、恐吓、骚扰、恐怖主义或暴力、武器开发、非法活动、破坏财产或系统,以及试图规避我们保障措施的行为。我们认真对待这些政策,并努力执行。我们使用自动化检测系统来大规模识别潜在的令人担忧的活动。这些系统使用一系列旨在识别可能表明违反政策或有害活动的信号的工具来分析用户内容和行为,包括分类器、推理模型、哈希匹配技术、黑名单和其他监控系统。
When an account or conversation is flagged, it is assessed in context by trained personnel. These human reviewers are trained on our policies and protocols, and operate within established privacy and security safeguards, meaning their access to user information is limited, conducted within secure systems, and subject to confidentiality and data protection requirements. Their role is to assess the flagged activity in context, including the content of the interaction, surrounding conversation, and any relevant patterns of behavior over time. This contextual review is important because automated systems may identify signals of potential concern without fully capturing intent or nuance.
当账户或对话被标记时,受过培训的人员会对其进行情境评估。这些人工审核员接受过我们的政策和协议培训,并在既定的隐私和安全保障措施下工作,这意味着他们对用户信息的访问受到限制,在安全系统内进行,并受保密和数据保护要求的约束。他们的职责是在情境中评估被标记的活动,包括交互内容、周围的对话以及随时间推移的任何相关行为模式。这种情境审查非常重要,因为自动化系统可能会识别出潜在的担忧信号,但无法完全捕捉意图或细微差别。
The goal is to determine whether the flagged activity violates our policies and/or indicates that a user may carry out an act of violence, requires escalation for more detailed human review, or can be dismissed or deprioritized as low risk or non-violative. When we determine that a bannable offense has occurred, we aim to immediately revoke access to OpenAI’s services. That may include disabling the account, banning other accounts of the same user, and taking steps to detect and stop the opening of new accounts. We have a zero-tolerance policy for using our tools to assist in committing violence. People can appeal enforcement decisions, and we review those appeals to confirm the outcome.
其目标是确定被标记的活动是否违反了我们的政策和/或表明用户可能实施暴力行为,是否需要升级进行更详细的人工审查,或者可以作为低风险或非违规行为被驳回或降低优先级。当我们确定发生了应被封禁的违规行为时,我们的目标是立即撤销其对 OpenAI 服务的访问权限。这可能包括禁用账户、封禁同一用户的其他账户,并采取措施检测和阻止新账户的开设。对于利用我们的工具协助实施暴力的行为,我们采取零容忍政策。用户可以对执行决定提出申诉,我们会审查这些申诉以确认结果。
We surface real-world support and refer to law enforcement when appropriate.
我们提供现实世界的支持,并在适当情况下向执法部门报告。
Most enforcement actions, including bans for violence, happen directly between OpenAI and the user, making clear they have crossed a line. But in some sensitive cases, we may contact others who are best positioned to help. Where we assess that a case…
大多数执行行动(包括因暴力而进行的封禁)直接在 OpenAI 和用户之间进行,明确告知他们已经越界。但在某些敏感案例中,我们可能会联系最能提供帮助的相关方。当我们评估认为某个案例……