An open-source spec for orchestration: Symphony

An open-source spec for orchestration: Symphony

用于编排的开源规范:Symphony

Six months ago, while working on an internal productivity tool, our team made a controversial (at the time) decision: we’d build our repo with no human-written code. Every line in our project repository had to be generated by Codex. 六个月前,在开发一款内部生产力工具时,我们的团队做出了一个(当时看来)颇具争议的决定:我们要构建一个完全不包含人工编写代码的代码库。项目仓库中的每一行代码都必须由 Codex 生成。

To make that work, we redesigned our engineering workflow from the ground up. We built an agent-friendly repository, invested heavily in automated tests and guardrails, and treated Codex as a full-fledged teammate. We documented that journey in our previous blog post on harness engineering. 为了实现这一目标,我们从零开始重新设计了工程工作流程。我们构建了一个对智能体(Agent)友好的代码库,在自动化测试和防护机制上投入了大量精力,并将 Codex 视为一名正式的团队成员。我们在上一篇关于“线束工程”(harness engineering)的博文中记录了这一历程。

And it worked, but then we ran into the next bottleneck: context switching. 虽然成功了,但我们随后遇到了下一个瓶颈:上下文切换。

To solve this new problem, we built a system called Symphony. Symphony is an agent orchestrator that turns a project-management board like Linear into a control plane for coding agents. Every open task gets an agent, agents run continuously, and humans review the results. 为了解决这个问题,我们构建了一个名为 Symphony 的系统。Symphony 是一个智能体编排器,它将 Linear 等项目管理看板转变为编码智能体的控制平面。每个待办任务都会分配一个智能体,智能体持续运行,而人类则负责审查结果。

This post explains how we created Symphony—resulting in a 500% increase in landed pull requests on some teams—and how to use it to turn your own issue tracker into an always-on agent orchestrator. 本文将介绍我们如何创建 Symphony(这使得部分团队的合并请求落地数量增加了 500%),以及如何利用它将你自己的问题追踪器变成一个全天候运行的智能体编排器。

The ceiling of interactive coding agents

交互式编码智能体的天花板

Even as they get easier to use, coding agents—whether accessed through web apps or CLI—are still interactive tools. 尽管编码智能体变得越来越易用,但无论是通过 Web 应用还是命令行界面(CLI)访问,它们本质上仍然是交互式工具。

As the scale of agentic work increased at OpenAI, we found a new kind of burden. Each engineer would open a few Codex sessions, assign tasks, review the output, steer the agent, and repeat. In practice, most people could comfortably manage three to five sessions at a time before context switching became painful. Beyond that, productivity dropped. We’d forget which session was doing what, jump between terminals to nudge agents back on track, and debug long-running tasks that stalled halfway through. 随着 OpenAI 智能体工作规模的扩大,我们发现了一种新的负担。每位工程师都要开启多个 Codex 会话,分配任务、审查输出、引导智能体,然后重复这一过程。在实践中,大多数人一次只能舒适地管理三到五个会话,超过这个数量,上下文切换就会变得非常痛苦。一旦超出这个限度,生产力就会下降。我们会忘记哪个会话在做什么,在不同的终端之间跳转以纠正智能体的偏差,并调试那些在中途停滞的长时间运行任务。

The agents were fast, but we had a system bottleneck: human attention. We had effectively built a team of extremely capable junior engineers, then assigned our human engineers to micromanaging them. That wasn’t going to scale. 智能体运行速度很快,但我们遇到了系统瓶颈:人类的注意力。我们实际上建立了一支能力极强的初级工程师团队,然后让我们的工程师去微观管理他们。这种模式是无法扩展的。

A shift in perspective

视角的转变

We realized we were optimizing the wrong thing. We were orienting our system around coding sessions and merged PRs, when PRs and sessions are really a means to an end. Software workflows are largely organized around deliverables: issues, tasks, tickets, milestones. 我们意识到我们优化的方向错了。我们一直围绕编码会话和合并后的 PR(Pull Requests)来构建系统,但实际上 PR 和会话只是达成目标的手段。软件工作流主要围绕交付物组织:问题、任务、工单、里程碑。

So we asked ourselves what would happen if we stopped supervising agents directly and instead let them pull work from our task tracker. 于是我们问自己:如果我们不再直接监督智能体,而是让他们从任务追踪器中主动领取工作,会发生什么?

That idea became Symphony, a written spec that functions as a supervisor to orchestrate agentic work. 这个想法最终演变成了 Symphony,一个作为监督者来编排智能体工作的书面规范。

Turning our issue tracker into an agent orchestrator

将问题追踪器转变为智能体编排器

Symphony started with a simple concept: any open task should get picked up and completed by an agent. Instead of managing Codex sessions in multiple tabs, we made our issue tracker the control plane. Symphony 的初衷很简单:任何待办任务都应由智能体领取并完成。我们不再需要在多个标签页中管理 Codex 会话,而是将问题追踪器变成了控制平面。

In this setup, each open Linear issue maps to a dedicated agent workspace. Symphony continuously watches the task board and ensures that every active task has an agent running in the loop until it’s done. If an agent crashes or stalls, Symphony restarts it. If new work appears, Symphony picks it up and starts organizing work. 在这种设置下,每个 Linear 待办事项都映射到一个专属的智能体工作区。Symphony 会持续监控任务看板,确保每个活跃任务都有一个智能体在循环运行,直到任务完成。如果智能体崩溃或停滞,Symphony 会重启它。如果有新任务出现,Symphony 会将其拾取并开始组织工作。

We built our workflow based on ticket statuses, using the task manager Linear as a state machine. 我们基于工单状态构建了工作流,将任务管理器 Linear 用作状态机。

In practice, Symphony decouples work from sessions and from pull requests. Some issues produce multiple PRs across repos; others are pure investigation or analysis that never touch the codebase. 在实践中,Symphony 将工作与会话及合并请求解耦。有些问题会跨仓库产生多个 PR;而另一些则纯粹是调查或分析,根本不涉及代码库。

Once work is abstracted this way, tickets can represent much larger units of work. 一旦工作以这种方式抽象化,工单就可以代表更大规模的工作单元。

We regularly use Symphony to orchestrate complex features and infrastructure migrations. For example, we might file a task asking the agent to analyze the codebase, Slack, or Notion and produce an implementation plan. Once we’re happy with the plan, the agent generates a tree of tasks, breaking the work into stages and defining dependencies between tasks. 我们经常使用 Symphony 来编排复杂的功能开发和基础设施迁移。例如,我们可以提交一个任务,要求智能体分析代码库、Slack 或 Notion,并制定实施计划。一旦我们对计划满意,智能体就会生成一个任务树,将工作分解为多个阶段,并定义任务之间的依赖关系。

Agents only start working on tasks that aren’t blocked, so execution unfolds naturally and optimally in parallel for this DAG (a sequence of execution steps). For example, we marked the React upgrade as blocked on a migration to Vite. As expected, agents started upgrading React only after the migration to Vite was complete. 智能体只处理未被阻塞的任务,因此执行过程会自然且最优地并行展开,形成一个有向无环图(DAG,即一系列执行步骤)。例如,我们将 React 升级标记为依赖于 Vite 迁移。正如预期的那样,智能体只有在 Vite 迁移完成后才开始升级 React。

Agents can also create work themselves. During implementation or review, they often notice improvements that fall outside the scope of the current task: a performance issue, a refactoring opportunity, or a better architecture. When that happens, they simply file a new issue that we can evaluate and schedule later—many of these follow-up tasks also get picked up by agents. While we oversee this process, agents stay organized and keep work moving forward. 智能体也可以自行创建任务。在实施或审查过程中,他们经常会注意到当前任务范围之外的改进点:性能问题、重构机会或更好的架构。当这种情况发生时,他们会提交一个新的工单,供我们评估并在稍后安排——其中许多后续任务也会被智能体自动领取。虽然我们监督这一过程,但智能体保持了组织性,并推动工作不断向前发展。

This way of working dramatically reduces the cognitive cost of kicking off ambiguous work. If the agent gets something wrong, that’s still useful information, and the cost to us is near zero. We can very cheaply file tickets for the agent to go prototype and explore, and throw away any explorations we don’t like. 这种工作方式极大地降低了启动模糊任务的认知成本。如果智能体做错了,那仍然是有用的信息,而对我们来说成本几乎为零。我们可以以极低的成本提交工单,让智能体去进行原型设计和探索,并丢弃任何我们不满意的探索结果。

Because the orchestrator runs on devboxes and never sleeps, we can add tasks from anywhere and know an agent will pick it up. For instance, one engineer on our team made three significant changes from the Linear app on his phone from a cozy cabin on shoddy wifi. 由于编排器运行在开发机(devbox)上且从不休息,我们可以从任何地方添加任务,并确信会有智能体去处理。例如,我们团队的一位工程师在度假小屋里,通过手机上的 Linear 应用,在不稳定的 Wi-Fi 环境下完成了三项重大变更。

An increase in exploration from working this way

这种工作方式带来的探索性增长

When observing the effects of working with Symphony, the most obvious change was output. Among some teams at OpenAI, we saw the number of landed PRs increase by 500% in the first three weeks. Outside of OpenAI, Linear founder Karri Saarinen highlighted a spike in workspaces created as we released Symphony. However, the deeper shift is how teams think about work. 在观察使用 Symphony 的效果时,最明显的变化是产出。在 OpenAI 的一些团队中,我们看到在最初的三周内,合并后的 PR 数量增加了 500%。在 OpenAI 之外,Linear 创始人 Karri Saarinen 指出,随着 Symphony 的发布,工作区创建数量出现了激增。然而,更深层次的转变在于团队对工作的思考方式。

When our engineers no longer spend time supervising Codex sessions, the economics of code changes completely. The perceived cost of each change drops because we’re no longer investing human effort in driving the implementation itself. 当我们的工程师不再需要花费时间监督 Codex 会话时,代码变更的经济学逻辑就完全改变了。每次变更的感知成本降低了,因为我们不再需要投入人力去驱动实施过程本身。

That changed our behavior. 这改变了我们的行为模式。