Tokens and Dreams

Tokens and Dreams / 令牌与梦想

The one great principle of the English law is, to make business for itself. The recurring theme running through my mind the last few months has been complexity within a software application. Forget coding. Sales is using AI to write all new code, so for us engineers there’s not a hell of a lot to do besides think (and be there to hold the bag). 英国法律的一大原则就是为自己创造业务。过去几个月里,我脑海中反复出现的主题是软件应用内部的复杂性。别提写代码了,销售部门正在用 AI 编写所有新代码,所以对于我们工程师来说,除了思考(以及在出问题时负责背锅)之外,没剩下多少事可做了。

Last week I generated a CSV of some internal company metrics. With only a sentence or two of prompt, generative AI extrapolated meaningful signals, correlated changes in the data with external signals that were not explicitly expressed (e.g. interest rate hikes), and built a polished interactive dashboard with relevant visualizations. Nevermind the fetishization of dark-mode or the tell-tale slop signs (what is it with that fucking font?) - most people would never notice these, it’s coded to look “modern” and it looks the part. I didn’t even ask for the dashboard or any visualizations. Results like these seem magical. I believe this is how most people experience generative AI. 上周,我生成了一份公司内部指标的 CSV 文件。仅凭一两句提示词,生成式 AI 就推断出了有意义的信号,将数据变化与未明确表达的外部信号(例如加息)关联起来,并构建了一个带有相关可视化效果的精美交互式仪表盘。别管那种对深色模式的迷恋或明显的粗制滥造迹象(那该死的字体是怎么回事?)——大多数人根本不会注意到这些,它被编码成看起来很“现代”的样子,而且确实看起来像那么回事。我甚至没有要求制作仪表盘或任何可视化图表。这样的结果看起来简直是魔法。我相信这就是大多数人体验生成式 AI 的方式。

Around the same time, I ran another AI coding experiment on one of my smaller open-source libraries, scout, and the process was so riddled with flaws and subtle failures that I know I lost time (and sanity) by even attempting to let AI write code. You see, scout is just a dead-simple RESTful search server written as a flask app. This is not frontiers of engineering shit, it’s about as mechanical as it gets in terms of implementation. As in my previous experiments with AI, the strength of the tool in coding tasks was that it could trace logic bugs and find inconsistencies precisely and accurately. The weakness is that as soon as it began to write code it produced tangles of weeds that had to be aggressively hand-pruned, because each iteration the weeds had a tendency to spread…and spread. 与此同时,我在我的一个小型开源库 scout 上进行了另一次 AI 编程实验,整个过程充满了缺陷和细微的失败,以至于我知道尝试让 AI 写代码纯粹是在浪费时间和消磨理智。要知道,scout 只是一个用 Flask 编写的极其简单的 RESTful 搜索服务器。这根本不是什么前沿工程,在实现层面它几乎是最机械化的工作。正如我之前与 AI 的实验一样,该工具在编程任务中的优势在于它能精确、准确地追踪逻辑错误并发现不一致之处。而弱点在于,一旦它开始写代码,就会产生一团团需要人工大力修剪的“杂草”,因为每一轮迭代,这些杂草都有蔓延……再蔓延的趋势。

Claude and his bros sit down to write me some code. This is why I’m stuck. I’m stuck between competing narratives, each of which is exerting real business pressure. To push-back when people’s daily experience of AI is of the magical variety is seen as almost perverse. I find myself constantly wanting to say “No! I embrace these tools! This is not thinly-veiled self-preservation! Just hear me out…” But how do I express this when, at every turn, a new silver bullet for agent orchestration, automatic coding, automatic review, automatic thinking is being announced? Going further, as one concerned with code as ground-truth for a system, how do I take the leap of faith and relinquish control to a swarm of agents and markdown files? Claude 和它的兄弟们坐下来为我写代码。这就是我陷入困境的原因。我夹在相互竞争的叙事之间,每一种叙事都在施加真实的商业压力。当人们每天对 AI 的体验都是“魔法般”的时候,提出反对意见几乎被视为一种反常行为。我发现自己总是想说:“不!我拥抱这些工具!这绝不是为了掩饰私心的自我保护!请听我说……”但当每一个转角都有新的“银弹”——无论是智能体编排、自动编码、自动审查还是自动思考——被宣布出来时,我该如何表达这一点?更进一步说,作为一个将代码视为系统“基本事实”(ground-truth)的人,我该如何跨出这一步,将控制权交给一群智能体和 Markdown 文件?

Cybernetics / 控制论

The map is not the territory. These dynamics, the rise of agentic coding loops, and some unrelated UFO stuff had me thinking about cybernetics (of all things). Cybernetics emerged after WWII as a framework for studying control mechanisms in complex systems. The canonical example is a thermostat that kicks on heating or cooling when the temperature falls outside the specified range, and then returns to passive mode when back within the acceptable range. The central idea is feedback. The “first law” of cybernetics, Ashby’s Law of Requisite Variety, states that in order to control a system, the regulating function (feedback) must be able to match the state-space complexity of the operating environment. The idea is that without adaptive control, the environment dominates the system and eventually leads to failure. 地图不等于疆域。这些动态变化、智能体编码循环的兴起,以及一些无关的 UFO 事件,让我开始思考控制论(竟然是这个)。控制论出现在二战后,作为研究复杂系统中控制机制的框架。最典型的例子是恒温器:当温度超出指定范围时,它会启动加热或冷却,当温度回到可接受范围时,它又会回到被动模式。其核心思想是反馈。控制论的“第一定律”,即艾什比的必要多样性定律(Ashby’s Law of Requisite Variety),指出为了控制一个系统,调节功能(反馈)必须能够匹配操作环境的状态空间复杂性。其核心观点是,如果没有自适应控制,环境就会主导系统并最终导致失败。

In software engineering, I see a two-layered system where at the surface you have the software artifact itself, the application that users interact with. It must be able to encode and handle the complexity of it’s intended usages. And then beneath that you have the actual code, the primary source of truth, where it is the programmer who is the control function for the overall system. The programmer’s job, then, is two-fold: to manage the state of the code so that it can produce an artifact which, in turn, correctly handles its designed use-case. The framing also explains to me why I’ve found the greatest utility in AI tooling in analysis tasks. When directed to do deep analyses on existing code-bases, reason about design tradeoffs, trace deadlocks or diagnose memory leaks AI has been amazing. In cybernetic terms, AI extends the amount of variety I’m able to cope with, and allows me to better regulate the code-base. 在软件工程中,我看到了一个双层系统:表面上是软件制品本身,即用户交互的应用程序。它必须能够编码并处理其预期用途的复杂性。而在其之下是实际的代码,即主要的真理来源,程序员则是整个系统的控制功能。因此,程序员的工作是双重的:管理代码的状态,使其能够生成一个能够正确处理其设计用例的制品。这种框架也解释了为什么我发现 AI 工具在分析任务中最为有用。当被要求对现有代码库进行深度分析、推导设计权衡、追踪死锁或诊断内存泄漏时,AI 的表现令人惊叹。用控制论的术语来说,AI 扩展了我能够处理的多样性,并使我能够更好地调节代码库。

Yet when directed top-down with specs, no matter how detailed, AI replaces the regulator with its own loop, made from the same substrate as the thing being regulated - the model watching the code and the model producing the code are now the same kind of process, and control dissolves. According to that first law, the programmer must be able to match the state-space complexity of the code itself, in order to be able to effectively wield it and adapt it over time. Over the years, approaches like Agile, YAGNI, KISS all tend towards optimizing for this kind of adaptability. The core idea is to keep the system simple and minimal enough that both the programmer and the software artifact can adapt as things unfold. 然而,当 AI 被自上而下地根据规范(无论多么详细)进行指导时,它会用自己的循环取代调节器,而这个循环与被调节的对象是由相同的基质构成的——观察代码的模型和生成代码的模型现在是同一种过程,控制力随之瓦解。根据那条第一定律,程序员必须能够匹配代码本身的状态空间复杂性,才能有效地驾驭它并随时间进行调整。多年来,敏捷开发(Agile)、YAGNI(你不会需要它)、KISS(保持简单愚蠢)等方法都倾向于优化这种适应性。其核心思想是保持系统足够简单和精简,以便程序员和软件制品都能随着事态的发展进行调整。

On the other end of the spectrum, domain-driven design and spec-driven development emphasize explicit front-loading of complexity modeling. This way the operating modes of the system are well-understood beforehand and the programmer’s role becomes more mechanical. Formal methods, meanwhile, are in their own special corner. They front-load, too, but are anchored to machine-verifiable proofs and are the opposite of a vibed-out markdown file. Those readers who are familiar with my open-source work can probably guess which camp I belong to. I prefer smaller tools, built bottom-up, where the design, behavior and invariants can reasonably be held in your head. Designing software from the bottom-up means building the lower-level component pieces to be clean and orthogonal, so that they can be composed into larger structures. 在另一个极端,领域驱动设计(DDD)和规范驱动开发强调显式地预先进行复杂性建模。这样,系统的操作模式在事先就被充分理解,程序员的角色变得更加机械化。与此同时,形式化方法则处于一个特殊的角落。它们也进行预先建模,但锚定在机器可验证的证明上,与那种“凭感觉”的 Markdown 文件截然相反。熟悉我开源工作的读者可能已经猜到我属于哪个阵营了。我更喜欢自下而上构建的小型工具,其设计、行为和不变量可以合理地装在你的脑子里。自下而上地设计软件意味着将底层的组件构建得简洁且正交,以便它们可以组合成更大的结构。