The Coming Loop
The Coming Loop / 即将到来的循环
Armin Ronacher’s Thoughts and Writings blog archive projects travel talks about The Coming Loop written on June 23, 2026. Armin Ronacher 的博客归档项目旅行演讲,关于《即将到来的循环》,写于 2026 年 6 月 23 日。
I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops. — Boris Cherny “我不再直接向 Claude 发送提示词了。我运行着一些循环,由它们去提示 Claude 并决定该做什么。我的工作是编写这些循环。”—— Boris Cherny
Over the last months I have watched more and more people build something on top of coding agents that feels meaningfully different from just using a coding agent. Some of this happens on top of Pi which is cool to see for sure! The pattern is the same everywhere though: work is put into a queue of sorts, a machine picks it up, attempts it, stops, and then some harness decides whether that was actually the end. If not, the harness continues the same session, injects another message, starts a fresh session with modified context, or sends the task to another machine. The task stays alive beyond the point where the model by itself would normally have said: “I am done.” 在过去的几个月里,我观察到越来越多的人在编码智能体(coding agents)之上构建了一些东西,这与单纯使用编码智能体有着本质的区别。其中一些是基于 Pi 构建的,这确实很酷!但无论在哪里,模式都是一样的:工作被放入某种队列,机器获取任务、尝试执行、停止,然后由某种“控制框架”(harness)来决定这是否真的是终点。如果不是,控制框架会继续当前的会话、注入另一条消息、开启一个带有修改后上下文的新会话,或者将任务发送给另一台机器。任务的生命周期被延长了,超出了模型本身通常会说“我完成了”的那个节点。
I think about that type of loop more than I want to admit. There is already an agent loop inside every coding agent. The model calls a tool, incorporates the result, calls another tool, reads a file, edits a file, runs tests, and eventually produces some answer. That loop is one we have been quite familiar with for a long time. The other loop is the harness level loop: the loop outside the agent loop. That loop is also not new. We have been doing versions of this since early Claude Code days, but that loop is becoming ever more present in agentic engineering and in recent weeks it has started to dominate the Twitter discourse. 我思考这种循环的次数比我愿意承认的要多。每个编码智能体内部都已经存在一个智能体循环:模型调用工具、整合结果、调用另一个工具、读取文件、编辑文件、运行测试,最终给出一个答案。这种循环我们早已非常熟悉。另一种循环是控制框架层面的循环:即智能体循环之外的循环。这种循环也不新鲜。从 Claude Code 的早期阶段开始,我们就一直在做这种循环的变体,但这种循环在智能体工程中正变得越来越普遍,并且在最近几周,它已经开始主导 Twitter 上的讨论。
I Am Not Good At This Yet / 我还不擅长这个
My current status is that I have not had much success with this way of working for code I deeply care about which turns out to be quite a lot of code. Part of that is taste and part of it is control. I attempt to set a high bar for what I want code to look like, and I want to understand the code I ship. Under pressure, or in a discussion with another human, I want to be able to explain what the system does without first having to ask a clanker to explain it to me. Now there is obviously a question if this desire to understand the code is one that I will still have a few years from now. For now I have not moved past the point of comprehension being important to me. 我目前的状况是,对于我深切关心的代码(事实证明这占了很大一部分),这种工作方式并没有给我带来太大的成功。部分原因是品味,部分原因是控制欲。我试图为我想要的代码设定很高的标准,并且我希望理解我交付的代码。在压力之下,或者在与他人讨论时,我希望能够解释系统在做什么,而不必先去问一个“机器零件”(clanker)来向我解释。当然,现在有一个问题是,这种对理解代码的渴望在几年后是否依然存在。就目前而言,我还没有跨过“理解对我而言很重要”这个阶段。
Given this desire, there is something I lack with my experience of code written without me paying attention, particularly from loops. Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving. If anything, on that front it feels to me that we might even be making steps in the wrong direction. At least for my taste, present-day hands-off harnesses like Claude Code with ultracode produce worse code than what we were producing last autumn. That’s because Claude Code, with Fable for instance will be working uninterrupted on a problem for thirty minutes or more, when previously the process would have been much more human in the loop. 鉴于这种渴望,我在面对那些我未曾关注、尤其是由循环生成的代码时,感到有些力不从心。当今的模型倾向于生成过于防御性、过于复杂、推理过于局部的代码。它们回避强不变性(strong invariants)。它们倾向于添加回退机制,而不是从根本上消除错误状态。它们重复代码、发明糟糕的抽象,并用更多的机制来掩盖不清晰的设计。更糟糕的是:到目前为止,我几乎没看到这方面有任何改善。甚至在某些方面,我觉得我们可能正在朝着错误的方向前进。至少以我的品味来看,像 Claude Code 配合 ultracode 这种当代的“全自动”控制框架,生成的代码比我们去年秋天生成的还要差。这是因为 Claude Code(例如配合 Fable)会在一个问题上不间断地工作三十分钟甚至更久,而以前这个过程会有更多的人类参与其中。
Furthermore it’s well understood that models tend to observe some local failure and add a local defense. Karpathy mentioned how they are “mortally terrified of exceptions”. In systems with important invariants, especially persisted data formats or core infrastructure, the right fix is not “handle every malformed case.” The right fix is to make the malformed case unrepresentable or impossible to write in the first place. Yet even with a lot of manual steering, that type of code does not come out of LLMs naturally, and even if the code comes out naturally like that, they will still attempt to handle now impossible errors. When you take that behavior and you put it behind loops, you tend to amplify it. If each iteration adds another small defense, the system slowly becomes less understandable while appearing more robust. The more hands-off you are, the more that happens. It also teaches really bad practices when tools like this are given to juniors without clear guidance. Because if you ask them, why they are doing all that, they will convincingly argue their case. 此外,众所周知,模型倾向于观察到某种局部故障并添加局部防御。Karpathy 曾提到它们是多么“极度恐惧异常”。在具有重要不变性的系统中,特别是持久化数据格式或核心基础设施中,正确的修复方法不是“处理每一个畸形情况”,而是让畸形情况从一开始就无法表示或无法编写。然而,即使有大量的人工引导,LLM 也无法自然地写出那种代码;即使偶尔写出来了,它们仍然会试图去处理那些现在已经不可能发生的错误。当你把这种行为放在循环中时,往往会放大它。如果每次迭代都增加一点防御,系统会慢慢变得难以理解,尽管看起来更健壮了。你越是放手不管,这种情况就越严重。当这些工具在没有明确指导的情况下交给初级开发者时,也会传授非常糟糕的实践。因为如果你问他们为什么要这样做,他们会给出非常有说服力的理由。
Where Loops Work / 循环在何处有效
At the same time, it would be dishonest to pretend the loop pattern does not work because it already works astonishingly well in some domains. Porting code is one of them. There are already impressive examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust. I have used it with success myself to port MiniJinja to Go. Performance explorations are another case where this works beautifully. A machine can try experiments, benchmark them, discard failures, and keep searching. Security scanning fits naturally too and so does almost any type of research: asking a system to explore a complex problem space and report back without necessarily committing lasting code. 与此同时,如果假装循环模式无效是不诚实的,因为它在某些领域已经表现得惊人地好。代码移植就是其中之一。已经有一些大规模自动移植的令人印象深刻的例子,包括关于将 Bun 的部分代码从 Zig 迁移到 Rust 的报道。我自己也曾成功地使用它将 MiniJinja 移植到 Go。性能探索是另一个效果极佳的案例。机器可以尝试实验、进行基准测试、丢弃失败的方案并继续搜索。安全扫描也很自然地适用,几乎任何类型的研究也是如此:让系统探索一个复杂的领域并报告结果,而不一定需要提交持久的代码。
One thing that many of these have in common is that they either do not generate new code, but transform code that already exists, or they produce code that intentionally does not have a long shelf life. They either produce proof of concepts or ideas, surface findings or are more akin to mechanical transformation. I believe that loops that produce artifacts without necessity of longevity or that create some form of clearly verifiable mechanical translation matters more than the general ability of a harness to mechanically measure a goal. Many successful applications of loops use another LLM as a judge or as an orchestrator. The mechanical translation case can be verified with a binary test case, but it can also be judged by an LLM instead! Claude Code, for instance, is increasingly good at creating entire experimental workflows that it will then execute. Sure, the code it produces is slop, but that’s more the fault of the model than the harness not being a good judge. 这些应用的一个共同点是:它们要么不生成新代码,而是转换现有的代码;要么生成的代码本身就不打算长期使用。它们要么产生概念验证或想法,要么呈现发现,或者更类似于机械式的转换。我相信,那些不需要长期维护的产物,或者能够创建某种清晰可验证的机械翻译的循环,比控制框架机械地衡量目标的能力更重要。许多成功的循环应用都使用另一个 LLM 作为评判者或协调者。机械翻译的情况可以通过二进制测试用例来验证,也可以由 LLM 来评判!例如,Claude Code 在创建整个实验工作流并执行方面变得越来越出色。当然,它生成的代码可能很烂,但这更多是模型的问题,而不是控制框架作为评判者不够好的问题。