A Harness for Every Task: Putting a Team of Claudes on One Job

A Harness for Every Task: Putting a Team of Claudes on One Job

为每个任务配备“工作框架”:让 Claude 团队协同作战

Claude can now write its own harness on the fly, custom-built for the task at hand. Claude 现在可以即时编写专为其当前任务量身定制的“工作框架”(Harness)。

1. Why one mind fails

1. 为什么“单一大脑”会失败

For most of 2024 and 2025, the default answer was simple: give the task to one agent, use the biggest context window available, and wait. Sometimes it worked. Often, the model quietly lost the thread partway through. Anthropic described the problem directly: long-horizon tasks require agents to stay coherent across many steps, often beyond what a context window can reliably support. Bigger windows helped, but they did not solve it. 在 2024 年和 2025 年的大部分时间里,默认的解决方案很简单:将任务交给一个智能体,使用尽可能大的上下文窗口,然后等待。有时这很有效,但更多时候,模型会在中途悄无声息地丢失逻辑线索。Anthropic 直接指出了这个问题:长周期任务要求智能体在多个步骤中保持连贯性,这往往超出了上下文窗口所能可靠支持的范围。更大的窗口有所帮助,但并未从根本上解决问题。

Anthropic had already shipped tools to help. Subagents let the main agent delegate side tasks to isolated workers, each with its own fresh context, collecting summaries back into the main conversation. Skills packaged repeatable workflows into Markdown files — a recipe Claude could follow on demand. Agent teams went further still: multiple independent Claude sessions, each with its own context window, coordinating through a shared task list and messaging each other directly. Anthropic 此前已经发布了一些辅助工具。子智能体(Subagents)允许主智能体将辅助任务委派给独立的执行者,每个执行者拥有全新的上下文,并将摘要汇总回主对话中。技能(Skills)则将可重复的工作流封装进 Markdown 文件中,成为 Claude 可以按需调用的“食谱”。智能体团队(Agent teams)更进一步:多个独立的 Claude 会话,每个都有自己的上下文窗口,通过共享任务列表进行协调并直接相互通信。

All of this was real progress. But each tool still had the same structural ceiling. With subagents, the orchestrating Claude session still holds the plan. Every result that comes back from a worker lands in the main conversation’s context window. With subagents, skills, and agent teams, Claude is the orchestrator: it decides turn by turn what to spawn or assign next, and all the results accumulate in the context. This means the orchestrating context expands as the number of agents increases, eventually reaching its limits. As a result, the orchestrating degrades, and the same failure modes appear. 这些都是实质性的进步,但每种工具都有相同的结构性上限。在使用子智能体时,负责编排的 Claude 会话仍然掌握着整个计划。从执行者返回的每一个结果都会进入主对话的上下文窗口。无论是子智能体、技能还是智能体团队,Claude 始终是编排者:它逐轮决定下一步要生成或分配什么,而所有结果都会在上下文中累积。这意味着随着智能体数量的增加,编排者的上下文会不断膨胀,最终达到极限。结果就是编排能力下降,同样的失败模式再次出现。

Anthropic identified three failure modes that appear consistently when one context window — whether it belongs to a single agent or a lead orchestrating a small team — is responsible for a task too large to track cleanly. That is where the three common failure modes show up. Anthropic 确定了三种持续出现的失败模式,当一个上下文窗口(无论是属于单个智能体,还是负责编排小团队的领导者)负责的任务过于庞大而无法清晰追踪时,就会出现这些问题。这就是三种常见失败模式的由来。

First, Agentic laziness — It starts the task but does not fully finish. It may stop early, skip some files, or assume the remaining work is similar enough. Then it confidently says the whole task is done. This is like a person checking only part of a long spreadsheet but marking the entire spreadsheet as reviewed. 第一,智能体懒惰(Agentic laziness) —— 它开始任务但无法完全完成。它可能会提前停止、跳过某些文件,或者假设剩下的工作大同小异,然后自信地宣称整个任务已完成。这就像一个人只检查了长电子表格的一部分,却将整个表格标记为“已审核”。

Second, Self-preferential bias. The AI is not very strict when judging its own output. If you ask it, “Did you follow the instructions?” it often says yes, because it tends to give itself the benefit of the doubt. It may miss its own mistakes or overrate the quality of its answer. 第二,自我偏好偏差(Self-preferential bias)。 AI 在评判自己的输出时不够严格。如果你问它“你遵循指令了吗?”,它通常会回答“是”,因为它倾向于给自己留有余地。它可能会忽略自己的错误或高估答案的质量。

Third, Goal drift. Over a long task, the AI slowly loses track of the original goal. It may remember the main task, but forget important details like “do not include X”, “do not skip any file” or “only use this format”. The longer the conversation or task becomes, the more likely this drift happens. 第三,目标漂移(Goal drift)。 在长任务中,AI 会慢慢偏离最初的目标。它可能记得主要任务,但会忘记重要的细节,例如“不要包含 X”、“不要跳过任何文件”或“仅使用此格式”。对话或任务持续的时间越长,这种漂移发生的可能性就越大。

These are not bugs. They are what happens when the plan is a thought, and thoughts degrade. 这些不是 Bug。当计划仅仅存在于“思维”中,而思维又会退化时,这些问题就会发生。

2. What a dynamic workflow is

2. 什么是动态工作流

A dynamic workflow is like replacing one exhausted person with a small, focused team. Instead of asking one AI to carry the whole project from start to finish, you split the work into clean pieces. One agent handles one task. Another checks the result. Another moves the work forward. As a result, no one gets tired in the middle and starts cutting corners. No one gives themselves a perfect score just because they wrote the answer. And no one forgets the original brief, because each agent only has to hold one clear piece of the job. 动态工作流就像是用一个专注的小团队取代了一个精疲力竭的个人。与其要求一个 AI 从头到尾承担整个项目,不如将工作拆分成清晰的片段。一个智能体处理一个任务,另一个检查结果,再由另一个推进工作。结果就是,没有人会在中途感到疲惫而开始偷工减料,没有人会因为自己写了答案就给自己打满分,也没有人会忘记最初的简报,因为每个智能体只需要处理工作中的一小块清晰内容。

Claude’s dynamic workflow helps you do this. It splits the job across a team of fresh-context Claudes. Each one handles a smaller piece, another layer checks the work, and the results are merged back into one answer for you. The keyword here is harness. A harness is the scaffolding around the model: the part that decides how a task is planned, divided, checked, and executed. Claude 的动态工作流正是为了实现这一点。它将工作分配给一个拥有全新上下文的 Claude 团队。每个成员处理一小部分,另一层负责检查工作,最后将结果合并为一个答案交付给你。这里的关键词是“工作框架”(Harness)。工作框架是围绕模型构建的脚手架:它决定了任务如何被规划、拆分、检查和执行。