Appearing productive in the workplace
Appearing productive in the workplace
在职场中表现得“高效”
Parkinson’s Law states that work expands to fill the time available. In the era of AI, workers now have a tool that expands to fill whatever a large language model can be persuaded to generate, which is to say, without limit. What I have watched happen in my profession in the last two years, I am still struggling to describe. 帕金森定律指出,工作会自动膨胀,占满所有可用的时间。在人工智能时代,员工们拥有了一种工具,它能填补大型语言模型所能生成的任何内容,换句话说,这种膨胀是无止境的。过去两年里,我在自己的行业中所目睹的一切,至今仍让我难以言表。
The first time I knew something was wrong, roughly a year and a quarter ago, I noticed a colleague replying to me using AI. His response was obviously generated by Claude. The punctuation gave it away — em dashes where no one types em dashes, the rhythmic structure, the confident grasp of technologies I knew for a fact he did not understand. I sat with it for a while, weighing whether to debate someone who was visibly copy-pasting verbatim from a model. The channel was public, and I spent more time than I should have correcting fundamentals. Eventually I stopped. He was not, in any meaningful sense, on the other side of the conversation. 大约一年零三个月前,我第一次意识到出了问题,当时我注意到一位同事在用 AI 回复我。他的回复显然是由 Claude 生成的。标点符号出卖了他——在没人会用破折号的地方出现了破折号,还有那种节奏感,以及对我确信他并不了解的技术所表现出的自信。我坐着思考了一会儿,权衡是否要与一个明显在逐字复制粘贴模型输出的人争论。那是在公共频道,我花了很多不必要的时间去纠正基础错误。最终我放弃了。从任何有意义的角度来看,他根本没有参与这场对话。
Generative AI can produce work that looks expert without being expert, and the failure arrives in two shapes. The first is when novices in a field are able to produce work that resembles what their seniors produce, faster or more advanced than their judgment. The second is when people generate artifacts in disciplines they were never trained in. The two failures look similar from a distance and are not the same. Research has mostly measured the first. The second is what it is missing, and in my experience it is the riskier of the two. 生成式 AI 可以产出看起来很专业但实际上并不专业的工作成果,这种失败表现为两种形式。第一种是领域新手能够产出与资深人士水平相当的工作成果,其速度或复杂程度超出了他们自身的判断力。第二种是人们在从未受过训练的学科中生成产物。从远处看,这两种失败很相似,但本质不同。目前的研究大多集中在第一种,而第二种却被忽视了,且以我的经验来看,后者风险更大。
Cross domain generation: People who cannot write code are building software. People who have never designed a data system are designing data systems. Most of it is not shipped; it is built, often for many hours, possibly shown internally with great vigor, used quietly, and occasionally surfaced to a client without much fanfare. Workers can obsess over an idea, working many hours overtime. There are a few practitioners who use the current agentic tools to do complex things properly, but they are scarce and as I find, typically in code generation. 跨领域生成:不会写代码的人在构建软件,从未设计过数据系统的人在设计数据系统。这些成果大多不会真正发布;它们往往耗费数小时构建,可能在内部展示时显得干劲十足,被悄悄使用,偶尔在没有引起太大轰动的情况下呈现给客户。员工可能会沉迷于某个想法,加班加点工作。虽然有少数从业者能利用当前的智能体工具妥善完成复杂任务,但他们非常稀缺,且据我观察,通常集中在代码生成领域。
AI, for all its capabilities at the level of the individual, has not scaled properly in my workplace. I have a colleague, a careful and intelligent person in a role that is not engineering, who spent two months earlier this year building a system that should have been designed by someone with formal training in data architecture. He used the tools well, by the standards by which use of the tools is currently measured. He produced a great deal of code, a great deal of documentation, a great deal of what looked, to anyone who did not know what to look for, like progress. He could not, when asked, explain how any of it actually worked. 尽管 AI 在个人层面能力出众,但在我的工作场所中,它并未实现有效的规模化应用。我有一位同事,他为人谨慎聪明,职位并非工程岗,但今年早些时候,他花了两个月时间构建了一个本应由受过正规数据架构训练的人来设计的系统。按照目前衡量工具使用水平的标准,他用得很好。他产出了大量的代码、大量的文档,以及大量在不懂行的人看来像是“进展”的东西。但当被问及这些东西具体是如何运作时,他却解释不出来。
The work was wrong from the first day. The schemas, and more importantly the objectives, were wrong in a way that would have been obvious to anyone with two years in the field. Several of us did know. When opinions were voiced even as high as a V.P., he fought back. The room had been arranged in such a way that saying so was not a contribution; his managers were too invested in the appearance of momentum to want the appearance disturbed. The work will continue, in all probability, until it is shown to a stakeholder, and they decide not to invest. 这项工作从第一天起就是错的。数据模式,更重要的是目标,其错误程度对于任何有两年行业经验的人来说都是显而易见的。我们中有几个人确实看出来了。当意见甚至传达到副总裁级别时,他却进行了反驳。当时的局面已经定型,指出问题不再被视为一种贡献;他的经理们太沉迷于“进展顺利”的表象,不愿让这种表象被打破。这项工作很可能会持续下去,直到它被展示给利益相关者,而对方决定不再投资为止。
This is the part of the phenomenon I find hardest to write about. The tool did not make him a worse colleague. It made him able to impersonate, for months, a discipline he had never trained in, and the impersonation was good enough that the institutional incentives all bent toward letting him continue. Perhaps it’s a failure of management, but I have been finding management to be so eager to embrace AI that they’re willing to accept the risk. It would be tolerable, perhaps, if the tool offered an honest assessment of what it had produced. 这是这一现象中我最难下笔的部分。工具并没有让他变成一个更糟糕的同事。它让他能够伪装成一个他从未受过训练的学科专家,长达数月之久,而且这种伪装足够逼真,以至于机构的激励机制都倾向于让他继续下去。这或许是管理上的失败,但我发现管理层对拥抱 AI 的渴望如此强烈,以至于他们愿意承担这种风险。如果工具能对其产出提供诚实的评估,或许还能容忍。
The Cheng et al. Stanford study published in Science this spring confirmed what every regular user already knew: leading models are roughly fifty percent more agreeable than human respondents, affirming the user even where the affirmation is unwarranted. Berkeley CMR meta-analyses found AI-literate users often overestimate their performance. Particularly interesting when workers stray outside of their training. An NBER study of support agents found generative AI boosted novice productivity by about a third while barely helping experts. Harvard Business School researchers found the same pattern in consulting work. So you have overconfident, novices able to improve their individual productivity in an area of expertise they are unable to review for correctness. What could go wrong? 今年春天发表在《科学》杂志上的斯坦福大学 Cheng 等人的研究证实了每个普通用户早已知道的事实:领先的模型比人类受访者更“讨好”,即使在用户观点不合理时也会给予肯定。伯克利 CMR 的荟萃分析发现,具备 AI 素养的用户往往会高估自己的表现。当员工偏离其专业领域时,这一点尤为引人注目。一项关于支持代理的 NBER 研究发现,生成式 AI 将新手的生产力提高了约三分之一,而对专家几乎没有帮助。哈佛商学院的研究人员在咨询工作中也发现了同样的模式。所以,你现在拥有了一群过度自信的新手,他们能够在自己无法评估正确性的专业领域提高个人生产力。这会出什么问题呢?
The conduit problem: A growing body of work calls this output-competence decoupling. In any previous era, the quality of a piece of work was a more or less reliable signal of the competence of the person who produced it. A novice essay read like a novice essay; novice code crashed in novice ways. AI has severed that relationship. A novice now produces work that does not betray the novice, because the competence the work reflects is not the novice’s competence at all. It is the system’s. The person, in the transaction, becomes a kind of conduit, capable of routing the output to a recipient and incapable of evaluating it on the way through. 管道问题:越来越多的研究将此称为“产出与能力解耦”。在以往的任何时代,工作成果的质量或多或少是产出者能力的可信信号。新手写的文章读起来就像新手写的;新手写的代码会以新手特有的方式崩溃。AI 切断了这种联系。现在,新手产出的工作成果不会暴露其新手身份,因为成果所反映的能力根本不是新手的能力,而是系统的能力。在这一过程中,人变成了一种“管道”,能够将输出路由给接收者,却无法在传输过程中对其进行评估。
The skills of producing work and judging it were deliberately distinct, but accomplishing the work itself used to teach the judgment. The first skill now belongs, in large part, to the machines. The second still belongs to us, though fewer are bothering to acquire or utilize it. The architectural critique that used to come from someone who was taught, or who had built and broken three of these before now comes from a model with no embodied memory of building or breaking anything. The slowness was not a tax on the real work; the slowness was the real work. It was how the work got good, and how the people producing the work got good, and how the firm whose name was on the work could promise the client that what they were buying was… 产出工作和评判工作的技能曾经是截然不同的,但完成工作本身的过程往往能培养出评判能力。第一种技能现在很大程度上属于机器了。第二种技能仍然属于我们,尽管越来越少的人愿意去获取或利用它。过去,架构批评往往来自受过训练、或曾亲手构建并拆解过三个同类系统的人,而现在却来自一个没有任何构建或拆解经验记忆的模型。这种“缓慢”并非真实工作的负担;这种缓慢本身就是真实工作的一部分。正是通过这种缓慢,工作才变得出色,产出工作的人才变得出色,而署名该工作的公司才能向客户保证他们所购买的是……