Vibe coding and agentic engineering are getting closer than I'd like

Vibe coding and agentic engineering are getting closer than I’d like

“Vibe 编码”与“代理式工程”正变得比我预想中更趋同

I recently talked with Joseph Ruscio about AI coding tools for Heavybit’s High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison. Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work. 最近,我与 Joseph Ruscio 在 Heavybit 的《High Leverage》播客中探讨了 AI 编程工具(第 9 期:Simon Willison 谈 AI 编程范式转移)。以下是我的一些核心观点,包括一个令我不安的发现:在我的实际工作中,“Vibe 编码”(Vibe coding)与“代理式工程”(Agentic engineering)已经开始趋同。

One thing I really enjoy about podcasts is that they sometimes push me to think out loud in a way that exposes an idea I’ve not previously been able to put into words. 我非常喜欢播客的一点是,它们有时会促使我大声思考,从而让我能够表达出那些此前无法用言语描述的想法。

Vibe coding and agentic engineering are starting to overlap

Vibe 编码与代理式工程开始重叠

A few weeks after vibe coding was first coined I published Not all AI-assisted programming is vibe coding (but vibe coding rocks), where I firmly staked out my belief that “vibe coding” is a very different beast from responsible use of AI to write code, which I’ve since started to call agentic engineering. 在“Vibe 编码”这个词被创造出来几周后,我发表了《并非所有 AI 辅助编程都是 Vibe 编码(但 Vibe 编码确实很棒)》。在那篇文章中,我坚定地表达了我的观点:“Vibe 编码”与负责任地使用 AI 编写代码(我后来将其称为“代理式工程”)是截然不同的两回事。

When Joseph brought up the distinction between the two I had a sudden realization that they’re not nearly as distinct for me as they used to be: 当 Joseph 提到这两者之间的区别时,我突然意识到,对我而言,它们之间的界限已经不再像过去那样清晰了:

Weirdly though, those things have started to blur for me already, which is quite upsetting. I thought we had a very clear delineation where vibe coding is the thing where you’re not looking at the code at all. You might not even know how to program. You might be a non-programmer who asks for a thing, and gets a thing, and if the thing works, then great! And if it doesn’t, you tell it that it doesn’t work and cross your fingers. But at no point are you really caring about the code quality or any of those additional constraints. “奇怪的是,这两者在我心中已经开始模糊,这让我感到相当不安。我原以为我们有非常明确的界限:Vibe 编码是指你根本不去查看代码。你甚至可能根本不懂编程。你可能是一个非程序员,你提出需求,得到结果,如果能用,那就太好了!如果不能用,你就告诉它‘不行’,然后祈祷它能修好。但在整个过程中,你根本不在乎代码质量或任何额外的约束。”

And my take on vibe coding was that it’s fantastic, provided you understand when it can be used and when it can’t. A personal tool for you, where if there’s a bug it hurts only you, go ahead! If you’re building software for other people, vibe coding is grossly irresponsible because it’s other people’s information. Other people get hurt by your stupid bugs. You need to have a higher level than that. 我对 Vibe 编码的看法是:只要你清楚它适用的场景,它就是极好的。如果你只是做一个个人工具,出了 Bug 也只影响你自己,那就尽管用吧!但如果你是在为他人开发软件,Vibe 编码就是极其不负责任的,因为这涉及他人的信息。你的愚蠢 Bug 会伤害到别人。你必须达到更高的标准。

This contrasts with agentic engineering where you are a professional software engineer. You understand security and maintainability and operations and performance and so forth. You’re using these tools to the highest of your own ability. I’m finding the scope of challenges I can take on has gone up by a significant amount because I’ve got the support of these tools. But I’m still leaning on my 25 years of experience as a software engineer. 这与“代理式工程”形成了对比。在代理式工程中,你是一名专业的软件工程师。你理解安全性、可维护性、运维和性能等概念。你正以自己最高的能力水平使用这些工具。我发现,由于有了这些工具的支持,我能应对的挑战范围大大增加了。但我依然依赖于我 25 年的软件工程师经验。

The goal is to build high quality production systems: if you’re building lower quality stuff faster, I think that’s bad. I want to build higher quality stuff faster. I want everything I’m building to be better in every way than it was before. 目标是构建高质量的生产系统:如果你只是更快地构建出低质量的东西,我认为这很糟糕。我想要的是更快地构建出更高质量的东西。我希望我构建的每一件东西在各方面都比以前更好。

The problem is that as the coding agents get more reliable, I’m not reviewing every line of code that they write anymore, even for my production level stuff. I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good. But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production? 问题在于,随着编码代理变得越来越可靠,我不再逐行审查它们编写的代码了,即使是生产级别的代码也是如此。我非常清楚,如果你让 Claude Code 构建一个运行 SQL 查询并将结果输出为 JSON 的 API 端点,它绝对能做对。它不会搞砸。你让它添加自动化测试、添加文档,你知道它会做得很好。但我确实没有审查那些代码。现在我产生了一种负罪感:如果我没有审查代码,我将其用于生产环境真的负责任吗?

The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on. If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”… I’m not going to go and read every line of code that they wrote. I’m going to look at their documentation and I’m going to use it to resize some images. And then I’m going to start shipping my own features. 真正能帮我释怀的是回想我在大型组织担任工程经理的经历。其他团队构建的软件是我团队的依赖项。如果另一个团队交给我一个东西并说:“嘿,这是图片缩放服务,这是使用方法……”我不会去阅读他们写的每一行代码。我会查看他们的文档,然后用它来缩放图片。接着,我就会开始发布我自己的功能。

And if I start running into problems where the image resizer thing appears to have bugs or the performance isn’t good, that’s when I might dig into their Git repositories and see what’s going on. But for the most part I treat that as a semi-black box that I don’t look at until I need to. I’m starting to treat the agents in the same way. 如果我开始遇到问题,比如图片缩放服务似乎有 Bug 或者性能不佳,那时我才会深入他们的 Git 仓库查看究竟发生了什么。但大多数情况下,我将其视为一个半黑盒,除非必要,否则我不会去查看。我现在开始以同样的方式对待这些 AI 代理。

And it still feels uncomfortable, because human beings are accountable for what they do. A team can build a reputation. I can say “I trust that team over there. They built good software in the past. They’re not going to build something rubbish because that affects their professional reputations.” Claude Code does not have a professional reputation! It can’t take accountability for what it’s done. But it’s been proving itself anyway—time and time again it’s churning out straightforward things and doing them right in the style that I like. 这仍然让我感到不舒服,因为人类需要为自己的行为负责。一个团队可以建立声誉。我可以评价说:“我信任那个团队。他们过去构建过优秀的软件。他们不会做出垃圾产品,因为那会影响他们的职业声誉。”但 Claude Code 没有职业声誉!它无法为自己的行为承担责任。但它一直在证明自己——它一次又一次地以我喜欢的风格,准确地完成了那些直观的任务。

There’s an element of the normalization of deviance here—every time a model turns out to have written the right code without me monitoring it closely there’s a risk that I’ll trust it at the wrong moment in the future and get burned. 这里存在一种“偏差正常化”(normalization of deviance)的因素——每当模型在没有我密切监控的情况下写出了正确的代码,我就面临一种风险:在未来的某个错误时刻,我会过度信任它,从而付出代价。

The new challenge of evaluating software

评估软件的新挑战

It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project. And now I can knock out a git repository with a hundred commits and a beautiful readme and comprehensive tests of every line of code in half an hour! It looks identical to those projects that have had a great deal of care and attention. Maybe it is as good as them. I don’t know. I can’t tell from looking at it. Even for my own projects, I can’t tell. 过去,如果你发现一个 GitHub 仓库有上百次提交、完善的 Readme 和自动化测试,你可以相当确定编写者在项目中投入了大量的心血。而现在,我可以在半小时内搞出一个拥有上百次提交、精美 Readme 以及覆盖每一行代码的全面测试的 Git 仓库!它看起来与那些投入了大量心血的项目一模一样。也许它确实和那些项目一样好。我不知道。我光看是看不出来的。即使是我自己的项目,我也无法判断。

So I realized what I value more than the quality of the tests and documentation is that I want somebody to have used the thing. If you’ve got a vibe coded thing which you have used every day for the past two weeks, that’s much more valuable to me than something that you’ve… 所以我意识到,比起测试和文档的质量,我更看重的是——我希望有人真正使用过它。如果你有一个 Vibe 编码出来的东西,并且在过去两周里每天都在使用,那对我来说,它比那些你……(文章在此处截断)