Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML

Specsmaxxing：谈谈如何克服 AI 精神错乱，以及为什么我用 YAML 编写规范

Does this look familiar? Wow. Claude. Mind-blowing. The whole feature works great. But I forgot to mention one very important edge case. You’re absolutely right! Let me fix that. Ah, and I just noticed. You used offset pagination for the table UI. Obviously cursor pagination is a better fit here? You’re absolutely right! Let me fix that. Also, is that an N+1 query? Fetching for every row in the table? Why not do a single round-trip? You’re absolutely right! Let me fix that. This is why I still have a job, right? 这段对话看起来眼熟吗？“哇，Claude，太震撼了。整个功能运行得非常完美。但我忘了提一个非常重要的边界情况。”“你说得对！我马上修复。”“啊，我刚注意到，你为表格 UI 使用了偏移分页（offset pagination）。显然这里用游标分页（cursor pagination）更合适吧？”“你说得对！我马上修复。”“还有，那是 N+1 查询吗？每一行表格数据都要请求一次？为什么不一次性完成往返请求呢？”“你说得对！我马上修复。”这就是为什么我还没失业，对吧？

Peak Slop: I’ve watched this scene play out many times, but the frequency is decreasing. Both my tools, and my methods for using them, continue to improve. I think Peak Slop has already come and gone. We are entering the post-slop era. My software is more robust, better tested, better integrated, and more observable than ever before. And my velocity keeps increasing! “垃圾内容（Slop）”的巅峰：我见过这种场景上演过很多次，但频率正在降低。我的工具以及使用它们的方法都在不断改进。我认为“垃圾内容”的巅峰期已经过去，我们正在进入“后垃圾内容”时代。我的软件比以往任何时候都更健壮、测试更充分、集成度更高、可观测性更强。而且我的开发速度还在不断提升！

Some days it feels like the sky is the limit. Other days, I am painfully reminded, the sky is not the limit. The context window is the limit. And what happens when I fill the context window? Or kill a session? Switch machines? Hand off the project to someone else? We already know what happens. The agent goes off the rails, or requirements get lost, and critically important detail gets squashed. So we adapt and mitigate. We document. We list requirements. Yes, millions of us are coming to the same realization: we should put more requirements in writing. We should update those requirements when they change. Look! I wrote a spec! Am I doing spec-driven development? Perhaps, but it is nothing new. Our mentors tried to teach us these habits decades ago. 有时我觉得天空才是极限。但另一些时候，我痛苦地意识到，天空并不是极限，上下文窗口（context window）才是。当我填满上下文窗口时会发生什么？或者关闭会话？更换机器？将项目移交给他人？我们都知道会发生什么：AI 代理会失控，需求会丢失，关键细节会被抹去。所以我们必须适应并采取缓解措施。我们编写文档，列出需求。是的，我们中数百万人正意识到同一件事：我们应该把更多需求写下来，并在需求变更时及时更新。看！我写了一份规范！我是在进行规范驱动开发吗？也许吧，但这并不新鲜。几十年前，我们的导师就试图教导我们这些习惯。

Specifying the plane while we fly it: What’s your favorite flavor of spec? A README.md and AGENTS.md is a good start. Don’t forget a testing-guide.md. Maybe an architecture.md, a PRD.md, and a design doc too. Have you considered md.md (to teach your agents how to write .md)? The more .md the better, right? Unironically, yes. Docs and unstructured specs can get you very, very far. Much farther than prompts alone. If you aren’t writing any docs yet, you should just stop reading this and start there. And remember, slop in, slop out. Nothing beats an organic, pasture-raised, hand-written spec. Spec-writing is where the act of software engineering really happens. So a few weeks ago, I started asking myself, how far can I take this? How far should I take this? 在飞行中设计飞机：你最喜欢哪种风格的规范？一份 README.md 和 AGENTS.md 是个不错的开始。别忘了 testing-guide.md。也许还需要 architecture.md、PRD.md 和设计文档。你考虑过 md.md 吗（用来教你的代理如何编写 .md）？.md 文件越多越好，对吧？说真的，确实如此。文档和非结构化规范能带你走得很远，远比单纯的提示词（prompts）有效得多。如果你还没开始写文档，你应该停止阅读这篇文章，从写文档开始。记住，垃圾进，垃圾出（slop in, slop out）。没有什么比一份原生态、纯手工编写的规范更好了。编写规范才是软件工程真正发生的地方。所以几周前，我开始问自己：我能把这件事做到什么程度？我应该做到什么程度？

Dreaming in markdown: As the story goes, I fell into an AI psychosis, I became a “spec maxxi”, and I spent hours and hours writing the most beautiful PRDs and TRDs you’ve ever seen. I drafted templates and skills and roles, thinking that maybe my agents can write specs too! I assembled an army, working together like a mini dark factory, to turn my specs into reality. My tasks grew more ambitious, and at one point I broke the vibe-coding sound barrier: an agent that ran for 1.5 hours unsupervised! Exciting. But what did that army ship for me? Well, it wasn’t slop, in fact it worked, which is more than I can say about the garbage that other companies force me to use every day. But it was still a bit sloppy. I’m far from a perfectionist and I love cutting corners more than most, but this somehow wasn’t good enough. One hallmark symptom of AI psychosis is using AI to build AI harnesses for building products, rather than just using AI to build the damn product. I embraced my illness, threw out the branch, scrapped all my markdown, and started all over again. 在 Markdown 中做梦：故事是这样的，我陷入了 AI 精神错乱，成了一个“规范狂人”，花了数小时编写了你见过最漂亮的 PRD（产品需求文档）和 TRD（技术需求文档）。我起草了模板、技能和角色，心想也许我的 AI 代理也能编写规范！我组建了一支“军队”，像一个小型黑工厂一样协同工作，将我的规范转化为现实。我的任务变得越来越雄心勃勃，有一次我甚至打破了“感觉编程（vibe-coding）”的音障：一个无人监管运行了 1.5 小时的代理！令人兴奋。但这支军队为我交付了什么？好吧，它不是垃圾，事实上它能运行，这比其他公司每天强迫我使用的那些垃圾软件要好得多。但它仍然有点粗糙。我远非完美主义者，而且比大多数人更喜欢走捷径，但这在某种程度上还是不够好。AI 精神错乱的一个典型症状是：利用 AI 去构建用于构建产品的 AI 工具链，而不是直接利用 AI 去构建产品本身。我接受了自己的“病态”，删除了分支，废弃了所有的 Markdown，然后从头开始。

Acceptance Criteria for AI (ACAI): A few days later, I noticed an ambitious little sub-agent doing something unexpected. The little guy just went and numbered my requirements and then referenced them all over my codebase. Why? I did not ask for this! I was disgusted. This is a tight coupling of code to spec, and spec to code, which is bad right? You really expect me to refactor all my code every time I change my spec? Oh. I suppose that’s a good thing? Interesting. I wonder… Perhaps these tags can help me navigate these massive PRs? Perhaps they can point me to where, exactly, a requirement is satisfied or tested! Perhaps I can annotate them with notes and states (todo, assigned, completed)! Perhaps I can start tracking acceptance coverage instead of test coverage! I leaned in. I named these tags ACIDs (Acceptance Criteria IDs). But a few questions remained. Can my ACIDs number and label themselves? Is it cumbersome to keep them aligned? How do I share specs and progress across sandboxes, branches, features and implementations? AI 验收标准 (ACAI)：几天后，我注意到一个雄心勃勃的小型子代理做了一些意想不到的事情。这个小家伙竟然自动给我的需求编号，并在整个代码库中引用它们。为什么？我没要求这样做！我感到很反感。这是代码与规范的紧密耦合，规范与代码也是，这很糟糕，对吧？你真的指望我每次修改规范时都重构所有代码吗？哦，等等。我想这其实是件好事？有意思。我在想……也许这些标签能帮我浏览那些庞大的 PR？也许它们能准确指出需求是在哪里被满足或测试的！也许我可以给它们加上注释和状态（待办、已分配、已完成）！也许我可以开始追踪“验收覆盖率”而不是“测试覆盖率”！我深入研究了下去。我将这些标签命名为 ACIDs（验收标准 ID）。但仍有一些问题：我的 ACID 能否自动编号和标记？保持它们的一致性是否很麻烦？我该如何在沙盒、分支、功能和实现之间共享规范和进度？

Acai.sh - an open-source toolkit: I built Acai.sh to solve some of these newly invented problems. And I’m very excited about the results. A simple and flexible template for feature specs, called feature.yaml. Feature.yaml makes it possible to reference each requirement by ACID. Tiny CLI to power your… Acai.sh - 一个开源工具包：我构建了 Acai.sh 来解决这些新出现的问题。我对结果感到非常兴奋。这是一个简单且灵活的功能规范模板，称为 feature.yaml。Feature.yaml 使得通过 ACID 引用每个需求成为可能。它还包含一个微型 CLI 工具来驱动你的……