The missing step between hype and profit

The missing step between hype and profit

炒作与盈利之间缺失的环节

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. 本文最初发表于我们的 AI 每周通讯《算法》(The Algorithm)。若想第一时间在收件箱中获取此类报道,请点击此处订阅。

In February, I picked up a flyer at an anti-AI march in London. I can’t say for sure whether or not its writers meant to riff on South Park’s underpants gnomes. But if they did, they nailed it: “Step 1: Grow a digital super mind,” it read. “Step 2: ? Step 3: ?” 今年二月,我在伦敦一场反 AI 游行中拿到了一张传单。我无法确定传单作者是否在影射《南方公园》里的“内裤地精”,但如果他们确实有意为之,那真是神来之笔:传单上写着:“第一步:培育数字超级大脑;第二步:?;第三步:?”

Produced by Pause AI, an international activist group that co-organized the protest, it ended with this plea to the reader: “Pause AI until we know what the hell Step 2 is.” 这张传单由共同组织此次抗议的国际激进组织“暂停 AI”(Pause AI)制作,结尾向读者发出呼吁:“在搞清楚第二步到底是什么之前,请暂停 AI。”

In the South Park episode “Gnomes,” which first aired in 1998, Kenny, Kyle, Cartman, and Stan discover a community of gnomes that sneak out at night to steal underpants from dressers. Why? The gnomes present their pitch deck. “Phase 1: Collect underpants. Phase 2: ? Phase 3: Profit.” 在 1998 年首播的《南方公园》剧集《地精》中,肯尼、凯尔、卡特曼和斯坦发现了一群地精,它们每晚溜进卧室偷内裤。为什么?地精们展示了它们的商业计划书:“第一阶段:收集内裤。第二阶段:?第三阶段:盈利。”

The gnomes’ business plan has since become one of the greats among internet memes, used to satirize everything from startup strategies to policy proposals. Memelord in chief Elon Musk once invoked it in a talk about how he planned to fund a mission to Mars. 自那以后,地精的商业计划成为了互联网上最经典的梗之一,被用来讽刺从创业策略到政策提案的各种事物。梗王埃隆·马斯克曾在一场谈话中引用它,来描述他计划如何为火星任务筹集资金。

Right now, it captures the state of AI. Companies have built the tech (Step 1) and promised transformation (Step 3). How they get there is still a big question mark. 眼下,这恰好捕捉到了 AI 的现状。各大公司已经构建了技术(第一步),并承诺了变革(第三步)。但他们如何实现这一目标,仍是一个巨大的问号。

As far as Pause AI is concerned, Step 2 must involve some kind of regulation. But exactly what it will call for and who will enforce it are up for debate. AI boosters, on the other hand, are convinced that Step 3 is salvation and tend to glaze over the middle bit. 在“暂停 AI”看来,第二步必须涉及某种监管。但具体需要什么样的监管以及由谁来执行,尚存争议。另一方面,AI 的支持者们坚信第三步就是救赎,并倾向于忽略中间环节。

They see us racing toward sunny uplands on the back of an “economically transformative technology,” as OpenAI’s chief scientist, Jakub Pachocki, put it to me a few weeks ago. They know where they want to go—more or less: It’s hazy up there and still some way off. But everyone’s taking a different route. Will they all make it? Will anyone? 正如 OpenAI 首席科学家 Jakub Pachocki 几周前对我所说,他们认为我们正背靠一种“具有经济变革意义的技术”,向着光明的未来飞奔。他们大致知道自己想去哪里:虽然前方迷雾重重且路途遥远,但每个人都在走不同的路。他们都能成功吗?有人能成功吗?

For every big claim about the future, there is a more sober assessment of how the rubber meets the road—one that quells the hype. Consider two recent studies. One, from Anthropic, predicted what types of jobs are going to be most affected by LLMs. (A takeaway: Managers, architects, and people in the media should prepare for change; groundskeepers, construction workers, and those in hospitality, not so much.) 对于每一个关于未来的宏大主张,都有一个更冷静的评估来审视现实——这往往能平息炒作。看看最近的两项研究。一项来自 Anthropic,预测了哪些类型的工作最受大语言模型(LLM)影响。(结论是:经理、建筑师和媒体从业者应为变革做好准备;而园丁、建筑工人和酒店服务人员受到的影响则较小。)

But their predictions are really just guesses, based on what kinds of tasks LLMs seem to be good at rather than how they really perform in the workplace. Another study, put out in February by researchers at Mercor, an AI hiring startup, tested several AI agents powered by top-tier models from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks frequently carried out by human bankers, consultants, and lawyers. Every agent they tested failed to complete most of its duties. 但这些预测实际上只是猜测,它们基于 LLM 似乎擅长处理的任务,而非它们在工作场所的实际表现。另一项由 AI 招聘初创公司 Mercor 的研究人员于二月发布的研究,测试了由 OpenAI、Anthropic 和 Google DeepMind 的顶级模型驱动的多个 AI 智能体,涵盖了银行家、顾问和律师经常执行的 480 项工作任务。结果显示,他们测试的每一个智能体都未能完成大部分职责。

Why is there such wide disagreement? There are a number of factors. For a start, it’s crucial to consider who is making the claims (and why). Anthropic has skin in the game. What’s more, most of the people telling us that something big is about to happen have reached that conclusion largely on the basis of how fast AI coding tools are getting. 为什么会有如此巨大的分歧?原因有很多。首先,考虑是谁在发表这些言论(以及为什么)至关重要。Anthropic 本身就是利益相关方。此外,大多数告诉我们“大事即将发生”的人,其结论很大程度上是基于 AI 编程工具的进化速度。

But not all tasks can be hacked with coding. Other studies have found that LLMs are bad at making strategic judgment calls, for example. What’s more, when they’re deployed, the tools aren’t just dropped into a cleanroom. They need to work in places contaminated with people and existing workflows. 但并非所有任务都能通过编程解决。其他研究发现,例如,LLM 在做出战略性判断方面表现不佳。更重要的是,当这些工具被部署时,它们并非进入一个无菌室,而是必须在充斥着人类和现有工作流程的复杂环境中运行。

And sometimes adding AI will make things worse. Sure, maybe those workflows need to be torn up and refashioned around the new technology for it to achieve transformative status, but that will take time (and guts). That big hole? It’s right where Step 2 should be. 有时,引入 AI 反而会让情况变得更糟。当然,也许这些工作流程需要被彻底推翻并围绕新技术进行重塑,才能实现变革,但这需要时间(和勇气)。那个巨大的漏洞?它正好位于“第二步”应该在的位置。

The lack of agreement on exactly what’s about to happen—and how—creates an information vacuum that gets filled by the latest wild claim of the week, evidence be damned. We’re so unmoored from any real understanding of what’s coming and how it will be deployed that a single social media post can (and does) shake markets. 对于即将发生什么以及如何发生缺乏共识,造成了一个信息真空,而这个真空被每周最新的疯狂言论所填补,证据反而被抛诸脑后。我们对未来及其部署方式缺乏真正的理解,以至于一条社交媒体帖子就能(并且确实)撼动市场。

We need fewer guesses and more evidence. But that’s going to require transparency from the model makers, coordination between researchers and businesses, and new ways to evaluate this technology that tell us what really happens when it’s rolled out in the real world. 我们需要更少的猜测和更多的证据。但这需要模型制造商的透明度、研究人员与企业之间的协调,以及评估这项技术的新方法,从而告诉我们当它在现实世界中应用时到底会发生什么。

The tech industry (and with it the world’s economy) rests on the held-out promise that AI really will be transformative. But that is not yet a sure bet. Next time you hear bold claims about the future, remember that most businesses are still figuring out what to do with their underpants. 科技行业(以及随之而来的全球经济)建立在 AI 确实会带来变革这一承诺之上。但这还不是板上钉钉的事。下次当你听到关于未来的大胆言论时,请记住:大多数企业甚至还没搞清楚该怎么处理他们的“内裤”。