Discover Broadly, Implement Narrowly

Discover Broadly, Implement Narrowly

广度探索,深度执行

I have been building software with coding agents across more than isolated functions and local fixes. The work has increasingly included requirements, architecture, implementation, review, correction and maintenance. That experience has made me wonder whether many agentic coding workflows are optimising for the wrong finish line. 我一直在使用编程智能体(coding agents)构建软件,其工作范围已不仅限于孤立的函数或局部的修复。这些工作越来越多地涵盖了需求、架构、实现、审查、修正和维护。这种经历让我不禁怀疑,许多智能体编程工作流是否在追求错误的终点。

The usual question is whether the agent completed the task. Did the feature work? Did the tests pass? Did the build succeed? Was existing behaviour preserved? All of that matters. But another question keeps intruding: what did completing this task reveal about the architecture of the system? 通常的问题是:智能体是否完成了任务?功能是否正常?测试是否通过?构建是否成功?现有行为是否得到保留?这些固然重要。但另一个问题不断浮现:完成这项任务揭示了系统架构的什么问题?

A change can work perfectly and still expose a deeper problem. It may introduce a second authority path, duplicate lifecycle state, blur a trust boundary, attach evidence to the wrong identity or leave behind code that nobody can confidently repair without returning to another agent. A passing build proves that the requested change works. It does not prove that the system is becoming safer, simpler, more intelligible or more maintainable. 一个改动可能运行完美,却依然暴露了更深层的问题。它可能引入了第二条授权路径、重复了生命周期状态、模糊了信任边界、将证据关联到了错误的身份,或者留下了无人敢在不求助于另一个智能体的情况下进行修复的代码。构建通过只能证明所请求的改动是有效的,并不能证明系统正变得更安全、更简洁、更易理解或更易维护。

That distinction matters more as agents produce implementation faster than humans acquire understanding of the resulting system. What follows is an attempt to sketch a bounded stewardship framework around that problem: one that separates observation from action, classifies what implementation reveals and reserves architectural change for a slower, more deliberate process. 随着智能体生成代码的速度超过人类理解系统演进的速度,这种区别变得愈发重要。下文试图围绕这一问题勾勒一个有边界的“管理框架”(stewardship framework):将观察与行动分离,对实现过程揭示的问题进行分类,并将架构变更保留给更缓慢、更审慎的流程。

Task completion is not system stewardship

任务完成不等于系统管理

Coding agents are usually given bounded tasks. Add this feature. Fix this bug. Implement this specification. Make the tests pass. Preserve existing behaviour. Their behaviour is therefore rational. They optimise for task completion and try to avoid unnecessary changes. In many contexts, that restraint is desirable. Nobody wants an agent to treat every feature request as permission to redesign the repository. 编程智能体通常被赋予有边界的任务:添加此功能、修复此 Bug、实现此规范、通过测试、保留现有行为。因此,它们的行为是理性的——它们优化任务完成度并尽量避免不必要的改动。在许多场景下,这种克制是可取的。没人希望智能体将每一个功能请求都视为重构代码库的许可。

Yet the same restraint can preserve structural weaknesses indefinitely. An agent may notice that several modules duplicate the same authority rule. It may discover that the object model cannot represent a newly required distinction. It may find that one field is serving simultaneously as draft state, publication state and user visibility. It may realise that a future maintainer will struggle to reconstruct why a particular implementation exists. Unless the task explicitly authorises architectural review, the agent is usually encouraged to solve the local problem and move on. 然而,同样的克制也可能导致结构性缺陷被无限期保留。智能体可能会注意到多个模块重复了相同的授权规则;可能会发现对象模型无法表示新需求中的区别;可能会发现某个字段同时充当了草稿状态、发布状态和用户可见性状态;也可能会意识到未来的维护者将难以重构出为何存在某种特定实现。除非任务明确授权进行架构审查,否则智能体通常会被鼓励解决局部问题后就此作罢。

The opposite instruction would be just as dangerous. Telling an agent to “always improve the architecture” invites speculative abstractions, opportunistic refactoring, fashionable infrastructure and the repeated reopening of settled decisions. The answer is not simply to make agents more architectural. It is to give them wider permission to observe than to act. 相反的指令同样危险。告诉智能体“始终改进架构”会招致投机性的抽象、机会主义的重构、时髦的基础设施以及对已定决策的反复推翻。答案不在于让智能体变得更“懂架构”,而在于赋予它们比行动更广泛的观察权限。

Observation authority should be wider than action authority

观察权限应大于行动权限

I find it useful to distinguish between two roles. The implementer remains tightly bounded. It completes the authorised request, preserves accepted constraints, adds the necessary tests and avoids unrelated refactoring. The steward is allowed to look more widely. It may ask whether the implementation has revealed a duplicated authority path, a missing trust boundary, lifecycle drift, object-model strain, weak observability, an absent recovery path or evidence that the architecture is working as intended. 我发现区分两个角色很有用。“执行者”(Implementer)保持严格的边界:完成授权请求,保留既定约束,添加必要测试,并避免无关的重构。“管理者”(Steward)则被允许看得更广:它可能会询问该实现是否揭示了重复的授权路径、缺失的信任边界、生命周期漂移、对象模型压力、可观测性薄弱、缺失的恢复路径,或者架构是否如预期般运作的证据。

The important point is not merely that there are two roles. Software engineering already distinguishes between authors and reviewers, feature work and refactoring, and implementation and architecture. The more specific claim is that the right to observe should be broader than the right to modify. Ordinary scope discipline says: do not touch unrelated problems. Architectural stewardship should say something more demanding: notice relevant problems beyond the immediate change, surface them through a governed channel and do not implement them without separate authority. 重点不仅仅在于存在两个角色。软件工程本身已经区分了作者与审查者、功能开发与重构、实现与架构。更具体的观点是:观察权应比修改权更广泛。普通的范围准则说:不要触碰无关问题。而架构管理则应提出更严格的要求:注意到即时改动之外的相关问题,通过受控渠道将其呈现出来,且在没有单独授权的情况下不得擅自实现。

The stewardship channel may discover beyond scope. It may not implement beyond scope. That asymmetry creates room for architectural awareness without turning every coding task into a redesign exercise. 管理渠道可以发现范围之外的问题,但不能在范围之外进行实现。这种不对称性在不将每个编程任务变成重构练习的前提下,为架构意识留出了空间。

The human burden changes, but does not disappear

人类的负担改变了,但并未消失

A fair objection is that this simply moves the burden. Instead of asking a human to understand all the generated code, we ask the human to understand the agent’s architectural findings. That objection is partly correct. The process does not remove the need for human judgement. It changes the form of the work. 一个合理的反驳是,这只是转移了负担。与其让一个人去理解所有生成的代码,不如让他们去理解智能体的架构发现。这种反驳有一定道理。该流程并未消除对人类判断的需求,它只是改变了工作的形式。

Rather than expecting a person to reconstruct every architectural consequence from a large generated diff, the agent is required to surface a limited set of grounded observations. Each observation should be tied to a known constraint, concrete evidence, a likely consequence, a confidence level, an owner and a condition under which the concern would be shown to be wrong. The human burden becomes adjudication rather than total reconstruction. That is still demanding. But it is a more realistic use of scarce attention than expecting every user of an agentic coding tool to notice every hidden architectural consequence unaided. 与其期望人类从庞大的代码差异(diff)中重构出每一个架构后果,不如要求智能体呈现出一组有限的、有据可查的观察结果。每一项观察都应关联到已知的约束、具体的证据、可能的后果、置信度、负责人,以及证明该担忧是错误的条件。人类的负担变成了“裁决”而非“完全重构”。这依然要求很高,但比起期望智能编程工具的每个用户都能在无人协助的情况下发现每一个隐藏的架构后果,这是一种更现实的稀缺注意力分配方式。

This framework does not eliminate the comprehension bottleneck. It tries to compress and structure it. 该框架并没有消除理解瓶颈,它只是试图对其进行压缩和结构化。

What counts as architectural evidence?

什么是架构证据?

Not every successful test, preference or hypothetical concern should count as architectural evidence. A useful definition is this: Architectural evidence is implementation or operational information that materially increases or decreases confidence in an architectural proposition. 并非每一次成功的测试、偏好或假设性的担忧都应被视为架构证据。一个有用的定义是:架构证据是指那些实质性地增加或减少对架构主张信心的实现或操作信息。

A second, materially different workflow successfully reusing the same transition model is evidence. Two independent changes duplicating the same authority logic are evidence of pressure. A production incident exposing ambiguous version binding is evidence of contradiction. “Microservices might scale better someday” is not evidence. The distinction matters because architecture discussions often… 第二个实质上不同的工作流成功复用了同一个转换模型,这就是证据。两个独立的改动重复了相同的授权逻辑,这就是压力的证据。一次生产事故暴露了模糊的版本绑定,这就是矛盾的证据。“微服务未来可能会扩展得更好”则不是证据。这种区别至关重要,因为架构讨论往往……