The LLM Is Not a Junior Engineer

The LLM Is Not a Junior Engineer

大语言模型(LLM)不是初级工程师

April 29, 2026 A collection of different thoughts about how LLMs might be thoughtfully incorporated into modern software development practices. In the wake of my last essay on why I don’t vibe code, I heard from various people on the Internet who read it (or the Bluesky skeets that inspired it). 2026年4月29日。这是一些关于如何将大语言模型(LLM)深思熟虑地融入现代软件开发实践的想法。继我上一篇关于为什么我不推崇“直觉编程”(vibe coding)的文章之后,我从互联网上收到了许多读者的反馈(包括那些启发我写作的Bluesky动态)。

Some had adopted similar stances on their own, often for overlapping reasons. Others were dedicated vibe coders who wanted to share what practices they used that made things manageable for them. I appreciate the feedback! I’m not going to change my stance, but it’s good to get perspective on how actual developers (and not senior executives or corporate marketing materials) have engaged with this technology. 有些人出于相似的原因持有相同的立场。另一些则是坚定的“直觉编程”拥护者,他们分享了自己如何通过特定实践来掌控开发过程。我很感谢这些反馈!我不会改变我的立场,但能了解到真正的开发者(而非高管或企业营销材料)是如何使用这项技术的,这很有意义。

Contrary to how I might seem at times, I do think there could be a place for LLM models in modern software development. I do understand why some developers are entranced with the ability to create applications in hours and turn prototypes into products. I see how it could be highly useful for home hobbyists, back-office bureaucrats and expert engineers to build the low-stakes software they’ve never had the time or skill for. 与我有时表现出的态度相反,我确实认为大语言模型在现代软件开发中有一席之地。我理解为什么一些开发者会被那种“数小时内创建应用、将原型转化为产品”的能力所吸引。我也明白,对于家庭爱好者、后台行政人员以及专家工程师来说,利用它构建那些因缺乏时间或技能而搁置的低风险软件,是非常有用的。

But I think it’s also incredibly dangerous to assume these same benefits apply equally for large software engineering teams or more sensitive software applications. It might be one thing if your vibe-coded personal recipe app mangles a measurement conversion; it’s entirely different if it destroys your production database (even if it pretends to be really sorry afterwards). 但我认为,如果认为这些好处同样适用于大型软件工程团队或更敏感的软件应用,那是极其危险的。如果你的“直觉编程”个人食谱应用搞砸了单位换算,那可能只是小事一桩;但如果它摧毁了你的生产数据库(即便它事后表现得非常抱歉),那性质就完全不同了。

It’s one thing if you stay up late coding on your personal project (I can’t criticize), but another if your team is just merging in code without testing or reviews because you’re too afraid to be the person who is slowing the team down. And I haven’t seen many public example where teams have set explicit bounds on how and where they want to use AI. 熬夜为个人项目写代码是一回事(我无可指摘),但如果你的团队因为害怕拖慢进度,在没有测试或审查的情况下就合并代码,那就是另一回事了。而且,我还没看到多少公开案例显示团队为AI的使用设定了明确的边界。

I haven’t been able to find many good examples of how teams are using Generative AI (GenAI) effectively. Much of the literature is still very much in the early phase of the AI lifecycle, with many articles exulting more in how AI replaces the need for software teams rather than how it might boost them. 我还没能找到多少关于团队如何有效使用生成式AI(GenAI)的优秀范例。目前大部分相关文献仍处于AI生命周期的早期阶段,许多文章更多是在吹捧AI如何取代软件团队,而不是探讨它如何赋能团队。

Perhaps, there are people figuring it out, but it feels like we’re going to continue to see a new land-speed records in self-owning before we learn how to do it right. And that’s probably how it will be. Software development as a practice has been informed by decades of experience of how to do things terribly wrong, and many of the standard practices we follow today were learned the hard way. 也许有人正在摸索,但感觉在学会正确使用它之前,我们还会不断看到各种“自毁式”操作刷新纪录。这可能就是必经之路。软件开发作为一种实践,是建立在数十年来“如何把事情搞砸”的经验之上的,我们今天遵循的许多标准实践都是通过惨痛教训学来的。

Maybe we need a few more years of companies deleting their entire code base or shipping with critical errors (perhaps with a few high-profile outages or bankruptcies in the mix) before the industry figures out how to sustainably work with these new tools. These are some of the things I’ve been thinking about though. 也许我们需要再经历几年公司误删整个代码库或发布带有严重错误的产品(或许还会伴随一些高调的宕机或破产事件),行业才能摸索出如何可持续地使用这些新工具。这些就是我一直在思考的一些问题。

The LLM Is Not a Junior Engineer

大语言模型不是初级工程师

First, I need to get something off my chest. It’s fairly common in our industry to anthropomorphize GenAI products and describe them as junior engineers or similar low-level coworkers. Stop it! 首先,我必须吐个槽。在我们的行业中,将生成式AI产品拟人化,并将其描述为初级工程师或类似的低级别同事,这相当普遍。别再这样做了!

While it may be useful to think of LLMs as interns instead of gods, this framing still grants AI a conceptual personhood that makes it seem more capable and reliable than it actually is. And it’s highly insulting to the actual junior engineers in the industry, who are usually some of the most talented and hard-working individuals you will find. 虽然将大语言模型视为实习生而非神明可能很有用,但这种框架仍然赋予了AI一种概念上的人格,使其看起来比实际更强大、更可靠。而且,这对行业中真正的初级工程师来说是一种极大的侮辱,他们通常是你所能找到的最有才华、最勤奋的人。

An AI model is not a person or even sentient. It has no long-term memory. It has no internalized morality of what are good or bad behaviors (apart from what might be implicitly reflected in its training data and reinforced in post-training calibration). It doesn’t learn from any of its actions itself. AI模型不是人,甚至没有知觉。它没有长期记忆。它没有关于什么是好行为或坏行为的内在道德观(除了训练数据中隐含反映并在训练后校准中得到强化的部分)。它本身不会从自己的任何行为中学习。

Instead, developers make it “learn” by carefully crafting introductory texts which tell the LLM how to act and what things to avoid. And it seems to work mostly, as long as someone remembers to tell the AI to stop talking about goblins so much. It is indeed impressive and little magical that it does work so well most of the time, but there are no guarantees that it won’t go wrong either. 相反,开发者通过精心编写引导文本来让它“学习”,告诉大语言模型该如何行动以及该避免什么。只要有人记得告诉AI别再喋喋不休地谈论地精,这通常是有效的。它在大多数时候表现得如此出色,确实令人印象深刻且带有一点魔力,但这并不能保证它不会出错。

To help the LLM build on previous steps, many coding agents will write to and read from a working memory file. This memory itself is also included as part of the inputs into the model for each new step, which means the longer the LLM is used on a single problem, the slower and more expensive each successive query gets, to the point where some engineers have reported hitting their weekly usage quotas within a single day. 为了帮助大语言模型基于之前的步骤进行构建,许多编程代理会读写一个工作记忆文件。这个记忆本身也会作为每个新步骤输入的一部分包含在模型中,这意味着在同一个问题上使用大语言模型的时间越长,后续的每次查询就越慢、成本越高,以至于一些工程师报告称他们一天之内就用完了每周的使用配额。

And when the LLM finally fills out its limited context window, all sorts of wrong things will happen, from API errors to selective amnesia as well as “lost-in-the-middle” confusion and issues inferring responses to new prompts. To mitigate this, some agentic models will include processes to summarize and compact their own memories; this is a lossy compression by its very nature, so there is some risk of distortion and loss there. 当大语言模型最终填满其有限的上下文窗口时,各种错误就会接踵而至,从API错误到选择性失忆,再到“中间丢失”(lost-in-the-middle)的困惑以及推断新提示响应时的问题。为了缓解这种情况,一些代理模型会包含总结和压缩自身记忆的过程;这本质上是有损压缩,因此存在失真和丢失的风险。

Others will regularly just start over with new agents that can stumble into the same mistakes and suggestions as their predecessors without active intervention. All of which is to say, if an LLM were a person (to be clear, he is not.), he would be an absolute nightmare to work with. 另一些模型则会定期重新启动新代理,如果没有主动干预,它们可能会重蹈覆辙,提出与前任相同的错误建议。总而言之,如果大语言模型是一个人(明确说明,他不是),那与他共事简直是一场噩梦。

Every LLM agent is essentially a combination of Amelia Bedelia and Leonard Shelby from Memento. It is up to you to tell him precisely what he should or should not do. And it is also your responsibility to keep a written memory that he will need to reference constantly to know how to make sense of his world. 每个大语言模型代理本质上都是《阿米莉亚·贝德利亚》(Amelia Bedelia,指代字面理解指令的文学角色)和《记忆碎片》(Memento)中伦纳德·谢尔比(Leonard Shelby,指代患有短期记忆丧失的角色)的结合体。你需要精确地告诉他该做什么或不该做什么。同时,你有责任维护一份书面记忆,他需要不断参考这份记忆才能理解他所处的世界。

Also, he doesn’t know or care about anything else outside of his written notes, and you will feel intense pressure to keep his notes as concise as possible without messing up. And when you do mess up, other developers will blame you and not him. Your corporate culture and values? Your general approach to architecture and testing? Your long-term product road map? He doesn’t know it and doesn’t care. 此外,除了书面笔记之外,他不知道也不关心任何其他事情,你会感到巨大的压力,必须在不搞砸的前提下尽可能保持笔记的简洁。而当你搞砸时,其他开发者会责怪你而不是他。你的企业文化和价值观?你通用的架构和测试方法?你的长期产品路线图?他不知道,也不关心。