When AI Builds Itself: Our progress toward recursive self-improvement
When AI Builds Itself: Our progress toward recursive self-improvement
当人工智能开始自我构建:我们迈向递归自我改进的进程
For most of AI’s history, humans drove every step in its development cycle. But at Anthropic, we are delegating a growing share of AI development to AI systems themselves, which is speeding up our work. 在人工智能发展史的大部分时间里,人类主导着其开发周期的每一个步骤。但在 Anthropic,我们正将越来越多的 AI 开发工作委托给 AI 系统本身,这正在加速我们的工作进程。
Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement. We are not there yet, and recursive self-improvement is not inevitable. But it could come sooner than most institutions are prepared for. 如果这一趋势发展到足够远,并拥有足够的算力,它将指向一个能够完全自主设计和开发其继任者的 AI 系统。这被称为“递归自我改进”。我们目前尚未达到这一阶段,递归自我改进也并非必然发生。但它到来的速度可能会超出大多数机构的预期。
Using public benchmarks and previously unreported data from within Anthropic, The Anthropic Institute is showing that AI is already accelerating the development of AI systems. To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025. 通过使用公开基准测试以及 Anthropic 内部此前未披露的数据,Anthropic 研究所(The Anthropic Institute)展示了 AI 已经在加速 AI 系统的开发。仅举一例:如今,Anthropic 工程师每季度交付的代码量平均是 2021 年至 2025 年期间的 8 倍。
The technical trends discussed in this piece suggest that AI systems are going to become much more capable in coming years. These trends have huge implications. AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world in science, healthcare, and beyond. But full recursive self-improvement also might increase the risks of humans losing control over AI systems. If systems are capable of fully building their own successors, the ways we secure them, monitor them, and shape their behavior all grow much more important. 本文讨论的技术趋势表明,AI 系统在未来几年内将变得更加强大。这些趋势具有深远的影响。能够自我构建的 AI 将是技术史上的一项重大进展——它可能为科学、医疗保健及其他领域带来巨大的福祉。但完全的递归自我改进也可能增加人类失去对 AI 系统控制的风险。如果系统有能力完全构建其继任者,那么我们保护、监控它们以及塑造其行为的方式将变得至关重要。
2021–2023: Building the first Claude
2021–2023:构建最初的 Claude
In the early days, work at Anthropic looked like work at any other tech company: people writing code and docs on laptops. 在早期,Anthropic 的工作方式与其他科技公司无异:人们在笔记本电脑上编写代码和文档。
2023–2025: Chatbots
2023–2025:聊天机器人
People used early chatbots to help with parts of the process, like generating short code snippets and copying the output into text editors. 人们使用早期的聊天机器人来辅助部分流程,例如生成简短的代码片段,并将输出内容复制到文本编辑器中。
2025–2026: Coding agents
2025–2026:编程智能体
As the agents became more capable, they were able to write and edit code on their own, sometimes entire files. 随着智能体能力增强,它们能够独立编写和编辑代码,有时甚至能处理整个文件。
Today: Autonomous agents
今日:自主智能体
Agents can now run code themselves and delegate hours of work to other agents. 智能体现在可以自行运行代码,并将数小时的工作量委托给其他智能体。
20XX?: Closing the loop
20XX 年?:闭环
In the future, agents could become capable enough to build and train models themselves. If this happens, future versions of Claude could be continuously improved by Claude itself. 未来,智能体可能具备足够的能力来自行构建和训练模型。如果发生这种情况,未来版本的 Claude 将能够由 Claude 本身持续改进。
Evidence from the outside world
来自外部世界的证据
The rate at which AI models improve is accelerating. The length of tasks that they can reliably complete on their own has been doubling roughly every four months, up from an earlier trend of doubling every seven months. AI 模型的改进速度正在加快。它们能够可靠地独立完成的任务时长大约每四个月翻一番,而此前的趋势是每七个月翻一番。
In March 2024, Claude Opus 3 could complete software tasks that take humans about four minutes to complete. A year later, Claude Sonnet 3.7 managed tasks that took about an hour and a half. A year after that, Claude Opus 4.6 managed 12-hour tasks. If this trend holds, tasks that take a skilled person days could come into range this year. In 2027, AI systems could be capable of tasks that take a person weeks. 2024 年 3 月,Claude Opus 3 可以完成人类大约需要 4 分钟完成的软件任务。一年后,Claude Sonnet 3.7 能够处理大约需要一个半小时的任务。又过了一年,Claude Opus 4.6 能够处理 12 小时的任务。如果这一趋势持续下去,熟练人员需要数天才能完成的任务今年就可能进入 AI 的处理范围。到 2027 年,AI 系统可能具备完成人类需要数周才能完成的任务的能力。
The same pattern appears on coding and research benchmarks. Benchmarks measure the performance of models in a given domain, and they’re “saturated” when models achieve close to 100% performance. SWE-bench is a standard test of real-world software engineering: it hands a model an actual open-source codebase and a real bug report, and asks it to write a code change that fixes the issue and passes the project’s own tests. Models have gone from scoring in the low single digits to saturating the benchmark in two years. 同样的模式也出现在编程和研究基准测试中。基准测试用于衡量模型在特定领域的表现,当模型达到接近 100% 的性能时,即被视为“饱和”。SWE-bench 是现实世界软件工程的标准测试:它向模型提供一个真实的开源代码库和一份真实的错误报告,并要求其编写代码修复该问题,且通过项目自身的测试。在两年内,模型的得分从个位数增长到了饱和该基准测试的水平。
CORE-Bench tests whether a model can reproduce existing research, a prerequisite for them to conduct original research. It gives an AI model the code and data behind a published paper, and asks it to rerun everything and confirm it can replicate the paper’s results. AI systems went from succeeding at reproducing the results roughly 20% of the time in 2024 to saturating the benchmark fifteen months later. METR, which runs the benchmark measuring how well models can complete long-duration tasks, found that Claude Mythos Preview could work for “at least” 16 hours and was “at the upper end of what [METR] can measure without new tasks.” CORE-Bench 测试模型是否能够复现现有研究,这是它们进行原创研究的前提。它向 AI 模型提供已发表论文背后的代码和数据,并要求其重新运行所有内容,确认能够复现论文结果。AI 系统在 2024 年复现结果的成功率约为 20%,十五个月后便达到了该基准测试的饱和水平。METR 负责运行衡量模型完成长周期任务能力的基准测试,结果发现 Claude Mythos Preview 可以工作“至少”16 小时,并且处于“在无需新任务的情况下 [METR] 所能测量的上限”。
Public benchmarks say a lot about the capabilities of these systems. But they can’t reveal the impact AI systems are having on speeding up AI development itself. For that, we need direct evidence from within AI companies like Anthropic. 公开基准测试在很大程度上反映了这些系统的能力。但它们无法揭示 AI 系统在加速 AI 开发本身方面所产生的影响。为此,我们需要来自 Anthropic 等 AI 公司内部的直接证据。
Evidence from within Anthropic
来自 Anthropic 内部的证据
Building a frontier model takes two broad categories of work. There is engineering: writing the code, standing up the infrastructure, and overseeing the model training. And there is research: deciding what experiments to run, interpreting what comes back, and figuring out which ideas to try next. 构建前沿模型需要两大类工作。一是工程:编写代码、搭建基础设施并监督模型训练。二是研究:决定进行哪些实验、解读反馈结果,并找出下一步尝试的方向。
Across both engineering and research, the picture is consistent. In engineering, Claude can be handed an underspecified problem and figure out how to solve it; humans supply the goal, but they no longer need to supply the method. In research, Claude can already match or outperform skilled humans at executing a well-specified experiment. However, large performance gaps persist when it comes to Claude exercising judgement in choosing goals in both engineering and research. That’s the gap between AI today and a future system that could autonomously design its own successor. 在工程和研究两个领域,情况是一致的。在工程方面,Claude 可以处理定义不明确的问题并找出解决方法;人类提供目标,但不再需要提供具体方法。在研究方面,Claude 在执行定义明确的实验时,已经能够媲美甚至超越熟练的人类。然而,当 Claude 需要在工程和研究中自主判断并选择目标时,仍存在巨大的性能差距。这就是当今 AI 与未来能够自主设计其继任者的系统之间的鸿沟。
It’s common for employees at Anthropic to receive more open-ended and important tasks as they gain more experience. Early on, they execute a task someone else specified, like, “The export button isn’t working, please fix it.” With experience, they’re handed a goal and design the approach themselves, such as, “Investigate why the network slows down under heavy load.” At the most senior levels, they are deciding which problems are worth working on at all: “What should the team build next quarter?” We can use internal Anthropic data to see how far Claude has come in being able to handle these different kinds of tasks. 在 Anthropic,员工随着经验的积累,通常会接到更开放、更重要的任务。起初,他们执行的是他人指定的任务,例如:“导出按钮无法工作,请修复它。”随着经验增加,他们会被赋予一个目标并自行设计方案,例如:“调查为什么网络在高负载下会变慢。”在最高级别,他们决定哪些问题值得投入精力:“团队下个季度应该构建什么?”我们可以利用 Anthropic 的内部数据,看看 Claude 在处理这些不同类型任务方面的进展。
Claude writes a significant proportion of Anthropic’s code. As of May 2026, more than 80% of the code we merge into Anthropic’s codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, this number was in… Claude 编写了 Anthropic 代码库中很大一部分代码。截至 2026 年 5 月,我们合并到 Anthropic 代码库中的代码有超过 80% 是由 Claude 编写的。在 2025 年 2 月 Claude Code 发布研究预览版之前,这个数字处于……