Using AI to write better code more slowly
Using AI to write better code more slowly
利用 AI 更慢地编写更好的代码
A lot of people seem convinced that the point of AI coding is to write low-quality code as fast as possible. Spew out barely-passable slop, open massive PRs, and merge them unvetted. Ship it! But the thing is, LLMs are very flexible. And you can use them just as effectively to write high-quality code more slowly. 许多人似乎确信,AI 编程的意义在于以最快的速度编写低质量的代码。产出勉强能用的垃圾代码,提交海量的 PR,然后在未经审查的情况下合并它们。发布就完事了!但事实是,大语言模型(LLM)非常灵活。你完全可以利用它们以更慢的速度编写高质量的代码。
This statement seems completely obvious to me at this point, and I almost didn’t want to write this post for that reason. But there seem to be enough people convinced that LLMs are only good as slop cannons that it’s worth making the opposite case. 这一点对我来说已经显而易见,甚至因为这个原因我差点不想写这篇文章。但似乎仍有足够多的人坚信 LLM 只不过是“垃圾制造机”,因此有必要提出相反的观点。
If Mythos taught us anything, it’s that LLM agents are really good at finding bugs. Throw them at a codebase enough times, and they will find so many bugs that you’ll barely know what to do with them. Like many others, I’ve also found this is true of non-Mythos models – some may be better than others at finding subtle bugs or avoiding false positives, but the fact is that the latest public models from Anthropic and OpenAI are good enough to find plenty of bugs in an unscrutinized codebase. 如果说 Mythos 教会了我们什么,那就是 LLM 智能体在发现 Bug 方面非常出色。只要在代码库中多次运行它们,它们就能找出多到让你不知所措的 Bug。和许多人一样,我也发现非 Mythos 模型同样如此——有些模型在发现隐蔽 Bug 或避免误报方面可能表现更好,但事实是,Anthropic 和 OpenAI 最新的公开模型已经足够强大,足以在未经审查的代码库中发现大量问题。
The problem is not so much finding the bugs, but instead prioritizing and validating them. For this reason I have a Claude skill I adapted from this article‘s core insight, which is that the more, different models you throw at a PR review, the less likely you are to get hallucinations or bogus bugs. 问题不在于发现 Bug,而在于对它们进行优先级排序和验证。因此,我根据这篇文章的核心见解改编了一个 Claude 技能:你投入到 PR 审查中的模型种类越多,产生幻觉或虚假 Bug 的可能性就越小。
The skill says (paraphrasing): Run a Claude sub-agent, Codex, and Cursor Bugbot to find bugs in this PR ranked by critical/high/medium/low. Once they’re all done, review their findings, do your own research to rule out false positives, and write a final report. That’s basically it. You can add your own definition of “bug” if you want – mine has stipulations about the KISS and DRY principles, writing accessible HTML/JSX, using proper indexes for SQL queries, etc. 该技能(意译)如下:运行 Claude 子智能体、Codex 和 Cursor Bugbot 来查找此 PR 中的 Bug,并按严重/高/中/低进行排序。完成后,审查它们的发现,进行自己的研究以排除误报,并撰写最终报告。基本就是这样。如果你愿意,可以添加自己对“Bug”的定义——我的定义中包含了关于 KISS 和 DRY 原则、编写可访问的 HTML/JSX、为 SQL 查询使用正确索引等规定。
In my experience, this skill always finds tons of bugs in a PR, and the false positive rate is near zero. It finds so many bugs that you’ll be bored senseless if you try to tackle them all. They’ll range from critical security or correctness bugs to the more mundane medium-level perf bugs to low-level “this comment is misleading”-type bugs. 根据我的经验,这个技能总能在 PR 中发现大量 Bug,且误报率几乎为零。它发现的 Bug 多到如果你试图全部解决,会感到极其枯燥。这些 Bug 范围很广,从关键的安全或正确性问题,到平庸的中级性能问题,再到“此注释具有误导性”这类低级问题。
My typical workflow is: 我的典型工作流程是:
- Have an agent fix all the criticals and highs (with my guidance on the proper solution), then repeat until no criticals/highs.
- 让智能体修复所有严重和高优先级问题(在我的指导下采取正确方案),然后重复此过程,直到没有严重/高优先级问题。
- Skip highs/mediums where the juice isn’t worth the squeeze (e.g. 100 lines of code to fix a narrow edge case).
- 跳过那些“投入产出比”不高的高/中优先级问题(例如,为了修复一个极罕见的边缘情况而编写 100 行代码)。
- Abandon the PR if it has so many criticals that I realize the whole approach is misguided.
- 如果 PR 中存在太多严重问题,以至于我意识到整个方案都走错了方向,那就放弃该 PR。
When I use this technique, I haven’t necessarily seen my velocity go up. If anything, the review process often finds pre-existing bugs, so I end up on a tangential side-quest where I’m writing unit tests and fixing subtle flaws that pre-date the PR. This is the opposite of the “10x productivity” slop-cannon style of development that most people imagine when they think of vibe coding, but I find it very satisfying. 当我使用这种技术时,我的开发速度并没有必然提升。相反,审查过程经常会发现预先存在的 Bug,导致我不得不去处理一些旁支任务,比如编写单元测试和修复 PR 之前就存在的隐蔽缺陷。这与大多数人在想到“氛围编程”(vibe coding)时所想象的“10 倍生产力”垃圾制造机式开发截然相反,但我发现这非常令人满足。
It’s a great way to improve the overall health of the codebase while also teaching you about the odd corners of it. In my experience, the happy-path of a complex architecture is less interesting than its failure modes. And pre-LLMs, this is usually how I got familiar with a codebase anyway: understanding where the assumptions break down, and then getting my hands dirty to fix it. 这是一种改善代码库整体健康状况的好方法,同时也能让你了解代码库中那些奇怪的角落。根据我的经验,复杂架构的“快乐路径”(正常运行流程)远不如其故障模式有趣。在 LLM 出现之前,我通常就是这样熟悉代码库的:理解假设在何处失效,然后亲自动手修复它。
If you’re the kind of person who is skeptical that AI coding is good for anything, then I doubt this post will persuade you. But if you’re the kind of developer who uses agents to write multi-hundred-line PRs that you barely understand yourself, I’d invite you to slow down a bit and try this other, slower style of “vibe coding.” 如果你是那种怀疑 AI 编程毫无用处的人,那么我怀疑这篇文章无法说服你。但如果你是那种使用智能体编写连你自己都难以理解的数百行 PR 的开发者,我建议你放慢脚步,尝试这种另一种更慢的“氛围编程”风格。
Ask an agent how your PR works and how it might fail. Have it write Markdown docs with Mermaid charts if necessary. Use Matt Pocock’s /grill-me skill until you understand the entire PR front-to-back. You might not be more “productive” in terms of raw lines of code. You might burn a ton of tokens just to find out that your entire plan was wrongheaded from the start. But I find this style of coding to be a more super-powered version of the kind of programming I was already trying to do before LLMs: careful, methodical, quality-obsessed, focused on making things better for the next coder. So take a deep breath, slow down, try this technique, and see if you don’t enjoy writing better code more slowly. 询问智能体你的 PR 是如何工作的以及它可能在何处失败。必要时让它编写带有 Mermaid 图表的 Markdown 文档。使用 Matt Pocock 的 /grill-me 技能,直到你从头到尾完全理解整个 PR。在代码行数方面,你可能不会变得更“高效”。你可能会消耗大量的 Token,结果却发现整个计划从一开始就是错误的。但我认为这种编程风格是我在 LLM 出现之前就一直在尝试的编程方式的超级增强版:谨慎、有条理、追求质量,并专注于为下一位开发者改进代码。所以,深呼吸,慢下来,尝试这种技术,看看你是否会享受这种“更慢地编写更好的代码”的过程。