Lines of code got a better publicist

代码行数有了更好的公关

It’s fifteen years ago (bear with me, I’ve been in this industry since the late 90s, most of my good stories start this way), and you’ve got two senior developers at a SaaS company. One of them writes 40% more lines of code than the other. Is that developer better? More impactful for the business? Should the other one be polishing their CV? Of course not. You’d want to know what actually shipped. What it did for customers, for revenue, for reliability. 回到十五年前（请耐心听我说，我从 90 年代末就进入了这个行业，我大多数精彩的故事都是这样开头的），假设一家 SaaS 公司有两位资深开发人员。其中一人写的代码行数比另一人多 40%。这位开发人员就更优秀吗？对业务的影响力更大吗？另一位应该去修改简历了吗？当然不是。你会想知道到底交付了什么，它对客户、营收和可靠性产生了什么影响。

Lines of code, PR counts… we spent a couple of decades learning these are stereotypically bad ways to measure a developer, to the point where suggesting them today is laughable. Sooooo… Here’s what the industry put on the billboard this year: Google: 75% of new code is AI-generated. Anthropic: ~80% of merged production code is written by Claude, and engineers ship “8x more code per quarter”. OpenAI: also ~80%, apparently. Cursor: “100M+ lines of enterprise code written per day”. Every single one is a volume claim. “Percent of code written by AI” is just lines of code with a better publicist. 代码行数、PR 数量……我们花了二十年时间才明白，这些是衡量开发人员的典型错误方式，以至于今天再提出这些指标简直可笑。然而……今年行业在广告牌上打出的标语却是：谷歌：75% 的新代码由 AI 生成。Anthropic：约 80% 的合并生产代码由 Claude 编写，工程师“每季度交付的代码量增加 8 倍”。OpenAI：显然也是约 80%。Cursor：“每天编写超过 1 亿行企业代码”。每一个都是关于“量”的声明。“AI 编写的代码百分比”不过是换了个更好公关的代码行数指标而已。

(The sceptic in me editing this draft would like to point out that it’s no coincidence that all of these are AI vendors of some kind, so pumping adoption is pretty important to them.) （我内心那个负责编辑草稿的怀疑论者想指出，这些公司全都是某种形式的 AI 供应商，这绝非巧合，因此推动采用率对他们来说至关重要。）

We used to claim outcomes. Rewind a few years and the headline number was different in kind, not just size. GitHub’s flagship claim was that developers completed tasks 55% faster with Copilot. Say what you like about that study (plenty did), but it was an outcome claim. Bold, falsifiable, about value. If it was wrong, you could show it was wrong. The 2026 claims can’t fail. That’s the genius of them; “75% of our code is AI-written” could be true, and will keep going up, regardless of whether anything got better (faster delivery, fewer incidents, happier customers, etc). A volume number can only ever disappoint you if adoption stalls, and adoption is the one thing most of us agree is real. 📈 So the claims got bigger and started saying less. 我们过去宣称的是“成果”。回溯几年，当时的头条数据在性质上与现在不同，不仅仅是规模上的差异。GitHub 的旗舰声明是：开发人员使用 Copilot 完成任务的速度提高了 55%。无论你对那项研究有何评价（很多人评价过），那是一个关于“成果”的声明。它大胆、可证伪，且关乎价值。如果它是错的，你可以证明它是错的。但 2026 年的这些声明是不会“失败”的。这就是它们的高明之处：“75% 的代码由 AI 编写”可能是事实，而且这个数字会持续上升，无论实际情况是否有所改善（交付更快、故障更少、客户更满意等）。只有在采用率停滞时，一个关于“量”的数字才会让你失望，而采用率恰恰是我们大多数人都认同的真实存在。📈 所以，这些声明变得越来越宏大，但传达的信息却越来越少。

What happened in between? The bit nobody puts on a billboard. The outcome evidence got complicated, that’s what happened. The strongest pro-adoption result is still Cui et al.; nearly 5,000 developers, +26% completed tasks, with the biggest gains for junior devs. Not really in dispute. But then GitClear showed code churn rising and refactoring collapsing as Copilot adoption deepened. Then METR ran the study many have quoted: experienced open-source devs were 19% slower with AI in their own codebases, while believing they were 20% faster. 这中间发生了什么？那些没人会写在广告牌上的内容。事实是，关于“成果”的证据变得复杂了。目前最强有力的支持采用 AI 的结果仍然是 Cui 等人的研究：近 5000 名开发人员，任务完成率提升 26%，其中初级开发人员获益最大。这一点基本没有争议。但随后 GitClear 的研究显示，随着 Copilot 的深入使用，代码流失率（churn）上升，重构工作却在减少。接着 METR 进行了那项被广泛引用的研究：经验丰富的开源开发者在使用 AI 处理自己的代码库时，速度反而慢了 19%，但他们却认为自己快了 20%。

But! Hold my beer… in February 2026 METR effectively walked it back: their follow-up estimates flipped to a speedup (with error bars wide enough to ride a Moto Guzzi, with panniers, through!), and they abandoned the study design entirely - because developers now refuse to work without AI, and can’t reliably self-report time on agentic work. Their latest position: AI probably speeds developers up in 2026, and we can no longer cleanly measure by how much. 但是！且慢……2026 年 2 月，METR 实际上撤回了这一结论：他们的后续估算转变为“速度提升”（尽管误差范围大到足以让一辆带着边箱的摩托车穿过！），并且他们完全放弃了该研究设计——因为开发人员现在拒绝在没有 AI 的情况下工作，也无法可靠地自述在智能体工作上的耗时。他们最新的立场是：AI 在 2026 年确实可能提升了开发速度，但我们已经无法准确衡量提升了多少。

Meanwhile at the company level, an NBER survey of ~6,000 executives found 69% of firms actively using AI and roughly nine in ten reporting no measurable productivity impact. The cross-study consensus sits somewhere around 10% organisational gains. Not nothing! Still bloody useful! Buuuut, also not “you don’t need developers anymore” territory. And if you’re a sceptic still quoting “19% slower”, you’re cherry-picking too. The research keeps updating; the industry just changed what it counts. 与此同时，在公司层面，NBER 对约 6000 名高管的调查发现，69% 的公司在积极使用 AI，但约十分之九的公司报告称没有看到可衡量的生产力影响。各项研究的共识是组织效率提升在 10% 左右。这并非毫无意义！依然非常有用！但……这还没到“你不再需要开发人员”的地步。如果你是一个还在引用“慢了 19%”的怀疑论者，那你也在进行选择性偏差。研究在不断更新，只是行业改变了统计指标。

Vanity metrics, now in AI flavour. It’s not just AI vendor claims, to be fair. Carnegie Mellon’s SEI and Accenture launched an AI Adoption Maturity Model just a few days ago: five levels, eight dimensions, marketed off a stat about 95% of organisations seeing no returns. Steve Yegge’s “8 levels of AI-assisted development” ranks you by which tools you run and how much supervision you give them. And every tools vendor now ships a maturity ladder whose top rung is, usually, “use more of our product”. These ladders measure adoption intensity and call it maturity. Same substitution, nicer packaging. 虚荣指标，现在换成了 AI 口味。公平地说，这不仅仅是 AI 供应商的声明。卡内基梅隆大学的 SEI 和埃森哲几天前刚刚发布了“AI 采用成熟度模型”：五个级别，八个维度，其营销依据是“95% 的组织没有看到回报”这一统计数据。Steve Yegge 的“AI 辅助开发 8 个等级”根据你运行的工具和给予它们的监督程度来对你进行排名。现在每个工具供应商都推出了一个成熟度阶梯，其最高层通常是“使用更多我们的产品”。这些阶梯衡量的是采用强度，却称之为成熟度。同样的替代，只是包装更精美了。

My favourite data point in this whole genre: Augment surveyed 219 engineering leaders and asked them to define “AI-native engineering”. They got 219 different answers. 🫠 And the prize for holding both ends of the rope goes to Anthropic, who gave us the “8x more code shipped” claim and one of the more rigorous studies of the year: an RCT finding that AI-assisted developers scored 17% lower on comprehension of the code they’d just shipped, with no statistically significant productivity gain. I use Claude every single day (it recommended half the links I read for this post, so the irony is not lost on me), the products are genuinely excellent, and their research arm updates while their marketing arm counts volume. Both things are true at once, which is kinda the point. 在这一类数据中，我最喜欢的一个点是：Augment 调查了 219 位工程领导者，要求他们定义“AI 原生工程”。他们得到了 219 个不同的答案。🫠 而“两头通吃”奖得主非 Anthropic 莫属，他们既给出了“交付代码量增加 8 倍”的声明，又发布了年度最严谨的研究之一：一项随机对照试验发现，AI 辅助的开发人员对他们刚刚交付的代码的理解力得分低了 17%，且生产力没有统计学意义上的提升。我每天都在使用 Claude（这篇文章中我读到的一半链接都是它推荐的，所以我很清楚其中的讽刺意味），这些产品确实非常出色，他们的研究部门在不断更新，而营销部门却在统计代码量。这两件事同时为真，这正是问题的关键所在。

Why I actually care. Because these numbers aren’t decorative. They move budgets, performance expectations, and headcount plans. In February, Jack Dorsey cut over 40% of Block’s workforce (4,000+ people) with AI as the explicit core thesis: “A significantly smaller team, using the tools we’re building, can do more and do it better.” A couple weeks later, Atlassian cut 10% (~1,600 people), while conceding it would be “disingenuous to pretend AI doesn’t change the mix of skills we need or the number of roles required”. 我为什么真正在意？因为这些数字不是装饰品。它们影响着预算、绩效预期和人员编制计划。今年 2 月，Jack Dorsey 裁减了 Block 超过 40% 的员工（4000 多人），其明确的核心论点就是 AI：“一个规模显著缩小的团队，利用我们正在构建的工具，可以做得更多、更好。”几周后，Atlassian 裁员 10%（约 1600 人），同时承认“假装 AI 不会改变我们所需的技能组合或所需职位数量是不诚实的”。

And there’s a key detail that gets me: Dorsey said, in the same announcement, that the business was strong and gross profit was growing. When a company says “AI made everyone more productive, so we need fewer people”, I want to see the evidence - and I don’t believe it exists today. Show me that x% of your workforce is genuinely idle (or even just underutilised) because the work can now be done by fewer people. Even then: I’ve never seen a product/SaaS company that didn’t have an endless roadmap. If you got a free headcount increase essentially overnight, why wouldn’t you use it to deliver more value? 有一个关键细节让我很在意：Dorsey 在同一份公告中表示，公司业务强劲，毛利润在增长。当一家公司说“AI 让每个人效率更高，所以我们需要更少的人”时，我想看到证据——而我认为目前不存在这样的证据。请向我证明你们有 X% 的员工确实处于闲置状态（甚至只是未被充分利用），因为工作现在可以由更少的人完成。即便如此：我从未见过哪家产品/SaaS 公司没有无穷无尽的路线图。如果你几乎在一夜之间获得了免费的“人力增长”，为什么不利用它来创造更多价值呢？