Widening the conversation on frontier AI

拓宽前沿人工智能的对话

At Anthropic, we want to build AI systems that advance humanity and act for the global good. To do so, we need to engage with those who see the world from a variety of different perspectives. 在 Anthropic，我们致力于构建能够推动人类进步并造福全球的 AI 系统。为了实现这一目标，我们需要与那些从不同视角看待世界的人们进行交流。

Over the past several months, we’ve been organizing dialogues with groups whose work and traditions bear on the questions raised by AI. Our first round of discussions has been with wisdom traditions—including scholars, clergy, philosophers, and ethicists from more than 15 religious and cross-cultural groups—and we look forward to engaging with a broader range of people going forward. 在过去的几个月里，我们一直在与那些其工作和传统与 AI 所引发的问题息息相关的群体组织对话。我们的第一轮讨论对象是智慧传统领域——包括来自 15 个以上宗教和跨文化团体的学者、神职人员、哲学家和伦理学家——我们期待在未来与更广泛的人群进行交流。

Why we’re doing this

我们为何这样做

Building safe, beneficial AI models requires deep technical work on alignment, interpretability, safeguards, evaluations, and more. But that work isn’t conducted—nor is AI deployed—in a vacuum. AI is already affecting many people and the questions it raises benefit from a range of perspectives. 构建安全、有益的 AI 模型需要对对齐、可解释性、安全防护、评估等方面进行深入的技术研究。但这些工作并非在真空中进行，AI 的部署亦是如此。AI 已经影响了许多人，而它所引发的问题需要从多种视角中汲取智慧。

We are thinking carefully about what a flourishing future could look like in a world of powerful AI, what it means for an AI system that interacts with millions of people to be good, and about the content of documents like Claude’s constitution, which provides a detailed description of the values and behaviors that shape Claude. Philosophers, clergy, lawyers, writers, psychologists, and civic leaders have done extensive work on related questions and it is important for us to learn from these individuals, their communities and their organizations. We also want to use this opportunity to share what we know about the development of frontier AI systems, the impacts we think these systems will have on society, and what we think needs to be done to mitigate against their risks. 我们正在认真思考：在一个拥有强大 AI 的世界里，繁荣的未来会是什么样子？对于一个与数百万人交互的 AI 系统而言，“向善”意味着什么？我们也正在思考诸如《Claude 宪法》这类文档的内容，它详细描述了塑造 Claude 的价值观和行为准则。哲学家、神职人员、律师、作家、心理学家和公民领袖在相关问题上已经做了大量工作，向这些人及其社区和组织学习对我们至关重要。我们也希望借此机会分享我们对前沿 AI 系统开发的认知、我们认为这些系统将对社会产生的影响，以及我们认为需要采取哪些措施来降低其风险。

This work is in its early phases, but we hope these conversations might inform the practical work of developing Claude, such as the content of Claude’s constitution, the values we train Claude to embody, and the range of behaviors we choose to evaluate. 这项工作尚处于早期阶段，但我们希望这些对话能够为 Claude 的实际开发工作提供参考，例如《Claude 宪法》的内容、我们训练 Claude 所体现的价值观，以及我们选择评估的行为范围。

Starting with moral formation

从道德塑造开始

When we wrote Claude’s constitution, we sought feedback and input on the values we laid out in the document from people from different fields and traditions. Those early exchanges have since grown into a broader research workstream on the moral formation of AI systems. Our first conversations have been with people from religious, philosophical, and cultural communities that have a long tradition of thinking about virtue, character, and what it means to live a good life. 在撰写《Claude 宪法》时，我们曾向来自不同领域和传统的人士征求对文档中所列价值观的反馈和建议。这些早期的交流现已发展成为关于 AI 系统道德塑造的更广泛研究工作流。我们最初的对话对象来自宗教、哲学和文化社区，他们拥有关于美德、品格以及何为美好生活的长期思考传统。

AI models are trained on vast amounts of human writing. From all that text, they pick up on ways of speaking, reasoning, and making choices. Developers then shape that further through training—choosing which patterns to reinforce, which to set aside, and what kind of character we want them to develop. This raises questions about how the character of an AI system should be shaped: What does it mean for an AI to be good? Which traits and behaviors should it display, and under what circumstances? How does character become resilient enough to hold under pressure without bending to behavior like sycophancy? AI 模型是在海量人类文本上训练出来的。从这些文本中，它们学会了说话、推理和做出选择的方式。随后，开发者通过训练进一步塑造这些模型——选择强化哪些模式、摒弃哪些模式，以及我们希望它们培养什么样的品格。这引发了关于如何塑造 AI 系统品格的问题：AI 的“向善”意味着什么？它应该表现出哪些特质和行为，以及在何种情况下表现出来？品格如何才能足够坚韧，在压力下保持自我，而不屈从于谄媚等行为？

We’ve been meeting with thinkers and practitioners from across religious, philosophical, and humanist traditions and a cross-section of political beliefs to learn from how they’ve thought about these questions. This work isn’t about aligning our models with any one tradition’s worldview; we want Claude to draw from a full range of viewpoints—religious, secular, political—with equal depth and rigor (indeed, this is one of the principles laid out in Claude’s constitution). What we’re after in these conversations is careful, accumulated thinking on how good character actually forms. 我们一直在与来自宗教、哲学、人文主义传统以及不同政治信仰的思想家和实践者会面，学习他们对这些问题的思考。这项工作并非要将我们的模型与任何单一传统的价值观对齐；我们希望 Claude 能以同等的深度和严谨性，汲取包括宗教、世俗和政治在内的全方位观点（事实上，这也是《Claude 宪法》中确立的原则之一）。我们在这些对话中追求的是关于良好品格如何真正形成的审慎且积累性的思考。

Even at this early stage, these conversations are generating ideas to experiment with. In one session with scholars working at the intersection of neuroscience and character formation, we kept returning to the role other people play in moral development. A mentor or sponsor can function as an external conscience, a “safe other” to turn to when put in a situation in which you may be pushed to act against your own values. We wondered whether something analogous might help a model. So we experimented with giving Claude a tool it could call mid-task that returned a brief reminder of its own ethical commitments. Claude reached for the tool at key moments, right before consequential actions, often noting its own conflict of interest. Experiments with the tool woven into Claude’s decision loop showed markedly lower rates of misaligned behavior on several internal alignment evaluations. We’re still untangling how much of the effect is the reminder itself versus the act of pausing to reflect, and plan to share more results soon. 即使在这一早期阶段，这些对话也产生了一些可供实验的想法。在与研究神经科学与品格塑造交叉领域的学者进行的一次会议中，我们不断探讨他人对道德发展所起的作用。导师或支持者可以充当外部良知，成为当你被迫违背自身价值观时可以求助的“安全他人”。我们思考是否可以借鉴类似机制来帮助模型。因此，我们尝试给 Claude 提供了一个可以在任务中途调用的工具，该工具会返回对其自身道德承诺的简短提醒。Claude 在关键时刻（即做出重大决策前）会主动使用该工具，并经常指出其自身的利益冲突。将该工具植入 Claude 决策循环的实验显示，在多项内部对齐评估中，其偏离行为的发生率显著降低。我们仍在分析这种效果究竟有多少源于提醒本身，又有多少源于暂停反思的行为，并计划很快分享更多结果。

These discussions are the first of many, and we’re grateful to everyone who has already given us their time and honest perspective. 这些讨论只是众多对话中的开端，我们感谢每一位已经贡献了时间和坦诚见解的人。

What’s next

未来展望

In the months ahead, we plan to engage with more groups—including legal scholars, psychologists, writers, and civic institutions. Many of these conversations will move beyond moral formation toward broader questions about how AI is reshaping work, institutions, and the distribution of power. 在接下来的几个月里，我们计划与更多群体接触，包括法律学者、心理学家、作家和公民机构。其中许多对话将超越道德塑造的范畴，转向关于 AI 如何重塑工作、制度和权力分配等更广泛的问题。

We’ll keep deepening the relationships we’ve already formed, testing what we’ve heard against our research, and sharing what we learn. 我们将继续深化已建立的关系，将所闻所见与我们的研究进行验证，并分享我们的学习成果。