General Intuition’s $2.3B bet that video games can train AI agents for the real world

General Intuition’s $2.3B bet that video games can train AI agents for the real world

General Intuition 豪掷 23 亿美元:押注电子游戏能训练出适应现实世界的 AI 智能体

As soon as I entered General Intuition’s R&D floor at its New York office, the company’s 31-year-old co-founder and CEO Pim de Witte directed my attention to a monitor perched on a standing desk. Someone appeared to be playing something like Fortnite. It wasn’t a person. “Our agent has been playing for 100 hours straight,” Kent Rollins, the company’s chief product officer, said, beaming. 当我走进 General Intuition 位于纽约办公室的研发楼层时,该公司 31 岁的联合创始人兼首席执行官 Pim de Witte 将我的注意力引向了立式办公桌上的一台显示器。屏幕上看起来有人在玩类似《堡垒之夜》(Fortnite)的游戏,但那并不是真人。公司首席产品官 Kent Rollins 满面春风地说道:“我们的智能体已经连续玩了 100 个小时了。”

Before I could get absorbed in the spectacle of an AI navigating the game’s virtual environment, I heard the electronic footsteps of a large quadrupedal robot approaching. “The same brain powering the agent playing the game is powering the robot,” de Witte told me. Josh Duplantis, a data analyst carrying a laptop streaming a live feed from the robot’s single camera, piped up to explain that the bot’s default mode was “exploration.” 还没等我沉浸在 AI 在虚拟游戏环境中穿梭的奇观中,我就听到了一个大型四足机器人走近时的电子脚步声。“驱动这个游戏智能体的大脑,同样也在驱动这个机器人,”de Witte 告诉我。数据分析师 Josh Duplantis 抱着一台笔记本电脑,上面正直播着机器人单目摄像头传回的画面,他插话解释说,该机器人的默认模式是“探索”。

Relying on that camera, its singular eye, the giant buglike bot walked up to me, circled around me, and continued into the office. It occasionally clipped the legs of chairs or bumped into an errant trash bin, much like a toddler who hasn’t yet learned how her body relates to the world around it. Duplantis said it took just eight minutes of real-world robotics data to fine-tune an AI model for the quadruped. What’s more, that data was collected on the street, not inside the office where the bot was currently navigating itself. 依靠那只单眼摄像头,这个巨大的虫状机器人走到我面前,绕着我转了一圈,然后继续向办公室深处走去。它偶尔会碰到椅腿或撞到乱放的垃圾桶,就像一个还没学会如何协调身体与周围世界关系的蹒跚学步的孩子。Duplantis 说,仅用了八分钟的现实世界机器人数据,就完成了对该四足机器人的 AI 模型微调。更重要的是,这些数据是在街头采集的,而不是在机器人当前所在的办公室里采集的。

An agentic model that can generalize from gameplay to simulation to embodiment is General Intuition’s raison d’être. And that model’s ability to figure out its place in the world has secured the backing of some heavy hitters. On Thursday, General Intuition said it raised $320 million at a $2.3 billion valuation, confirming TechCrunch’s previous reporting. The round brings General Intuition’s total disclosed funding to $454 million, after the $134 million round it raised at launch last October. 能够从游戏玩法推广到模拟环境再到实体机器人的智能体模型,正是 General Intuition 存在的意义。该模型理解自身在世界中位置的能力,已经获得了多位重量级投资者的支持。周四,General Intuition 宣布以 23 亿美元的估值筹集了 3.2 亿美元,证实了 TechCrunch 此前的报道。继去年 10 月成立时筹集的 1.34 亿美元后,本轮融资使 General Intuition 的公开融资总额达到了 4.54 亿美元。

The startup was spun out of de Witte’s other company, Medal, which allows gamers to upload and share video game clips. The hundreds of millions of hours of uploaded gameplay provided the initial dataset to train General Intuition’s model in spatial-temporal reasoning — or understanding how to move through space and time. But the key ingredient wasn’t the gameplay footage; it was the action labels embedded in those clips: records of exactly what buttons a player pressed and when. 这家初创公司是从 de Witte 的另一家公司 Medal 中拆分出来的,Medal 允许玩家上传和分享游戏片段。数亿小时的上传游戏画面为训练 General Intuition 模型的时空推理能力(即理解如何在空间和时间中移动)提供了初始数据集。但关键要素并非游戏画面本身,而是嵌入在这些片段中的动作标签:记录了玩家在何时按下了哪些按钮。

Most competitors, de Witte says, are trying to infer actions from video alone, which he argues is insufficient. “We view this as just the next stage of future pre-training,” de Witte said. “We have a single model that can respond to Fortnite information on the screen and take action, but also to real-world dynamics in a way that an LLM could never.” de Witte 表示,大多数竞争对手都在尝试仅通过视频来推断动作,他认为这是不够的。“我们将其视为未来预训练的下一个阶段,”de Witte 说,“我们拥有一个单一模型,它不仅能响应屏幕上的《堡垒之夜》信息并采取行动,还能以大语言模型(LLM)永远无法做到的方式响应现实世界的动态。”

At one point, de Witte set me up with a laptop running General Intuition’s world model, a simulated environment generated frame-by-frame rather than rendered by a traditional game engine. As I often do when testing world models, I walked straight into a series of walls. In other demos I’ve tried, the agents you control sometimes pass right through, but this one didn’t. From the millions of hours of gameplay, it somehow learned that walls are walls, ladders are for scaling, and shadows lengthen as the sun moves. 有一次,de Witte 让我操作一台运行 General Intuition 世界模型的笔记本电脑,这是一个逐帧生成的模拟环境,而不是由传统游戏引擎渲染的。正如我在测试世界模型时常做的那样,我径直撞向了一堵墙。在我尝试过的其他演示中,受控智能体有时会直接穿墙而过,但这个模型没有。通过数百万小时的游戏数据,它不知何故学会了墙就是墙、梯子是用来攀爬的,以及随着太阳移动阴影会变长。

For General Intuition, this world model isn’t the product; it’s the training environment (referred to as “the gym” internally). The company ultimately wants to sell the agentic model itself, and de Witte argues that the action data embedded in gameplay helps the model discern the “self” from the “environment” in a way that gives it a richer understanding of causality. 对于 General Intuition 来说,这个世界模型并不是最终产品;它只是训练环境(内部称为“健身房”)。该公司最终想要销售的是智能体模型本身,de Witte 认为,嵌入在游戏玩法中的动作数据有助于模型区分“自我”与“环境”,从而使其对因果关系有更深刻的理解。

Impressive though General Intuition’s technology appears in demos, the company isn’t the only one trying to crack this problem. Moreover, getting such a model to hold up in the physical world, at scale, hasn’t yet been done. Most approaches of this kind require enormous amounts of real-world data that’s gathered slowly and expensively. General Intuition’s bet is that gameplay is a scalable shortcut. Its investors are okay with that bet, too. 尽管 General Intuition 的技术在演示中看起来令人印象深刻,但它并不是唯一一家试图攻克这一难题的公司。此外,让此类模型在物理世界中大规模应用尚未实现。大多数此类方法需要海量的现实世界数据,而这些数据的收集既缓慢又昂贵。General Intuition 的赌注是:游戏玩法是一条可扩展的捷径。其投资者也认可这一赌注。

General Intuition’s latest round was led by Khosla Ventures, with participation from General Catalyst, Jeff Bezos, Eric Schmidt, Nico Rosberg, and researchers at Google DeepMind and MIT. The vast majority of the round will go toward scaling compute capacity. General Intuition has a deal with CoreWeave and plans to focus on pre-training the next version of the model. A slice has been earmarked for making its API more broadly available by the end of summer. General Intuition 的最新一轮融资由 Khosla Ventures 领投,General Catalyst、杰夫·贝索斯(Jeff Bezos)、埃里克·施密特(Eric Schmidt)、尼科·罗斯伯格(Nico Rosberg)以及来自 Google DeepMind 和麻省理工学院的研究人员参与。本轮融资的绝大部分资金将用于扩大计算能力。General Intuition 已与 CoreWeave 达成协议,计划专注于下一代模型的预训练。其中一部分资金已被预留,用于在夏末之前使其 API 得到更广泛的应用。

Vinod Khosla, whose firm led the round, says he was drawn to de Witte’s vision and the company’s proprietary data position. “If you look at LLMs, when reasoning emerged, it was a quantum leap,” Khosla told me in a phone interview. “In world models, I think the quantum leap is the emergence of intuition in the AI, a human intuition-like capability. The human action data and reaction data you have in games is the key part to the emergence of intuition.” 领投该轮融资的 Vinod Khosla 在电话采访中告诉我,他被 de Witte 的愿景和该公司拥有的专有数据优势所吸引。“如果你观察大语言模型,当推理能力出现时,那是一个质的飞跃,”Khosla 说,“在世界模型中,我认为质的飞跃是 AI 中‘直觉’的出现,这是一种类似人类直觉的能力。你在游戏中获得的人类动作数据和反应数据,是直觉出现的关键部分。”

The vision is a generational company. General Intuition relies on data from Medal’s video game clips. Image Credits: Medal.TV. General Intuition isn’t the only company to notice that Medal’s human action data is a key piece of the puzzle of building dynamic world models and general agents. Brianna Martin, the startup’s chief of staff, said the company was born, in part, after Medal turned down an acquisition offer from a major lab. There have been other offers since, too. 其愿景是打造一家跨时代的公司。General Intuition 依赖于来自 Medal 游戏片段的数据。(图片来源:Medal.TV)。General Intuition 并不是唯一注意到 Medal 的人类动作数据是构建动态世界模型和通用智能体关键拼图的公司。该初创公司的幕僚长 Brianna Martin 表示,这家公司的诞生,部分原因是 Medal 拒绝了一家大型实验室的收购要约。此后,还有过其他收购要约。

De Witte and his co-founders, Eloi Alonso, Adam Jelley, and Vincent Micheli, aren’t interested in being acquired, and neither are the startup’s investors looking for an exit just yet. The amount and quality of proprietary data General Intuition has via Medal is one of the reasons Khosla is convinced the startup is a generational bet, not an M&A target; that it could become the backbone for generalized agents and world models in simulation and the real world. “At this point, it would be a data acquisition, which is sort of uninteresting,” Khosla said. Part of that bet also in… De Witte 和他的联合创始人 Eloi Alonso、Adam Jelley 以及 Vincent Micheli 对被收购并不感兴趣,该公司的投资者目前也并不急于退出。General Intuition 通过 Medal 拥有的专有数据的数量和质量,是 Khosla 坚信这家初创公司是一项跨时代投资而非并购目标的原因之一;它有可能成为模拟环境和现实世界中通用智能体和世界模型的基石。“在这一点上,如果只是为了收购数据,那就没什么意思了,”Khosla 说。这一赌注的部分内容还在于……