Google’s Genie world model can now simulate real streets with Street View

Google’s Genie world model can now simulate real streets with Street View

Google 的 Genie 世界模型现可通过街景模拟真实街道

We’ve all pulled up Street View on Google Maps to show a friend what our childhood home looked like, or dropped that little person icon onto the streets of Paris to see if we booked a hotel in a cool neighborhood. Imagine being able to do that, but in a more immersive, interactive way that allows you to really simulate the street and its environs, and even do things like adjust the weather or see what it would look like in a “Day After Tomorrow” scenario. That’s one of the goals of Google’s latest integration. 我们都曾打开 Google 地图的“街景”功能,向朋友展示童年故居的样子,或者将那个小黄人图标拖放到巴黎的街道上,看看我们预订的酒店是否位于一个很酷的街区。试想一下,如果能以一种更具沉浸感、交互性的方式做到这一点,让你真正模拟街道及其周边环境,甚至可以调整天气,或者看看在“后天”(Day After Tomorrow)这种极端气候场景下会是什么样子。这就是 Google 最新整合功能的目标之一。

Starting today, Google DeepMind is connecting Street View to Project Genie, the company’s general-purpose world model that can generate diverse, interactive environments. The new feature launched during the Google I/O 2026 developer conference. 从今天开始,Google DeepMind 将街景功能与 Project Genie 相连接。Genie 是该公司的一款通用世界模型,能够生成多样化且具有交互性的环境。这项新功能是在 Google I/O 2026 开发者大会期间发布的。

“It’s really powerful for both the agent [and robotics] use case and for humans to play with, and that’s always been the thesis of Genie,” Jack Parker-Holder, a research scientist on DeepMind’s open-endedness team, told TechCrunch. He gave the example of a new robot being deployed in London, which rarely sees the sun. Genie could, Parker-Holder says, simulate those scarce occasions when the sun glints off the Victorian housing, so the rays don’t shock the robot when it happens. “对于智能体(及机器人)用例以及人类娱乐体验来说,它都非常强大,这也一直是 Genie 的核心理念,”DeepMind 开放性研究团队的研究科学家 Jack Parker-Holder 告诉 TechCrunch。他举了一个例子:如果一个新机器人被部署在阳光稀缺的伦敦,Genie 可以模拟那些罕见的阳光照在维多利亚式建筑上的场景,这样当真实情况发生时,机器人就不会因为光线变化而产生“惊吓”。

“Simultaneously, you might say, ‘I’m going to New York City, but not this time of year,’” he continued. “‘It’s going to be snowy. I want to see what that block looks like in the snow.’” “同时,你可能会说,‘我要去纽约,但不是现在这个季节,’”他继续说道,“‘我想看看那条街区在下雪时是什么样子。’”

Google has been collecting Street View data for 20 years via cars with cameras and individuals strapped with “tracker backpacks.” The tech giant has collected north of 280 billion images across 110 countries and seven continents. 20 年来,Google 一直通过装有摄像头的汽车以及背着“追踪背包”的个人来收集街景数据。这家科技巨头已经在全球 110 个国家和七大洲收集了超过 2800 亿张图像。

“With Street View, we have imagery from a large quantity of the world,” Jack said. “You can imagine how potentially powerful it is to combine this rich source of real-world information and data with an ability to simulate worlds.” “通过街景,我们拥有了世界上大量地区的影像,”Jack 说,“你可以想象,将这些丰富的现实世界信息和数据源与模拟世界的能力相结合,其潜力有多么巨大。”

Google released its latest world model Genie 3 for research preview last August and opened up access to the tool to Google AI Ultra subscribers in the U.S. in January, allowing customers to create interactive game worlds from text prompts or images. The goal is to use Genie for educational experiences, gaming, and robotics training. Genie 3 is already helping to power one of Waymo’s simulators to train its self-driving cars on “exceedingly rare events” like tornadoes or casual elephant encounters. Adding Street View data to that could help Waymo prepare to launch in more cities around the globe. Google 于去年 8 月发布了最新的世界模型 Genie 3 的研究预览版,并于 1 月向美国的 Google AI Ultra 订阅用户开放了该工具,允许用户通过文本提示或图像创建交互式游戏世界。其目标是将 Genie 用于教育体验、游戏和机器人训练。Genie 3 目前已经为 Waymo 的一个模拟器提供支持,用于训练其自动驾驶汽车应对龙卷风或偶然遇到大象等“极其罕见的事件”。将街景数据加入其中,可以帮助 Waymo 为在全球更多城市开展业务做好准备。

Waymo has its own simulator that it relied on to scale to 11 U.S. cities and test its AI driver in several more. The difference with Genie, says Parker-Holder, is that those are all from the car’s point of view. Street View allows for not only simulating a world anchored to a real place, but also shifting the point of view to other types of agents, like a human or a robot. Waymo 拥有自己的模拟器,它依靠该模拟器扩展到了美国 11 个城市,并在更多城市测试其 AI 驾驶员。Parker-Holder 表示,Genie 的不同之处在于,传统的模拟器视角都局限于汽车本身。而街景不仅可以模拟基于真实地点的世界,还可以将视角切换到其他类型的智能体,例如人类或机器人。

Google is launching Street View in Genie to some Ultra users in the United States starting today, with access rolling out at scale over time. Global Ultra users will gain access over the next few weeks, per the company. Google 从今天开始向美国的部分 Ultra 用户推出 Genie 中的街景功能,并将在未来逐步扩大访问范围。据该公司称,全球 Ultra 用户将在未来几周内获得访问权限。

The researchers’ goal is to put this new capability into as many hands as possible, per Diego Rivas, a product manager at DeepMind. He cautioned that Street View in particular and Genie in general is still an experiment, so there’s much to improve upon in terms of accuracy. In the samples the Google team showed me — including an underwater simulation of a neighborhood I used to live in — the results are impressive and recognizable, but still video game quality rather than photorealistic. DeepMind 产品经理 Diego Rivas 表示,研究人员的目标是让尽可能多的人使用这项新功能。但他提醒说,街景功能乃至整个 Genie 目前仍处于实验阶段,因此在准确性方面还有很多需要改进的地方。在 Google 团队向我展示的样本中——包括我曾经居住过的一个街区的水下模拟——结果令人印象深刻且具有辨识度,但仍处于电子游戏画质,而非照片级真实感。

The models are also not yet physics-aware, meaning they don’t yet understand cause and effect. For example, in a simulation of a woman running through a snowy Joshua Tree, she ran right through cacti and bushes. Compare that to, say, Google’s image generator Nano Banana — which can now generate perfect text in infographics — or its video generator Veo — which understands that paper boats drift on water currents, smoke disperses into the air, and fabric drapes over forms. Physics isn’t hard-coded into these models; they learn it intuitively over time through passive observation, as a living being would. 这些模型目前还不具备物理感知能力,这意味着它们还不理解因果关系。例如,在模拟一名女性穿过下雪的约书亚树国家公园时,她直接穿过了仙人掌和灌木丛。相比之下,Google 的图像生成器 Nano Banana(现在可以在信息图中生成完美的文字)或视频生成器 Veo(它理解纸船会随水流漂动、烟雾会散入空气、织物会覆盖在物体表面)则表现更好。物理规律并非硬编码在这些模型中;它们像生物一样,通过长期的被动观察直观地学习这些规律。

“I think for this kind of model, it’s maybe six to 12 months behind video in terms of the accuracy and quality, so I think it’s something we will solve,” Parker-Holder said. “我认为对于这类模型来说,在准确性和质量方面,它可能比视频生成模型落后 6 到 12 个月,所以我认为这是一个我们能够解决的问题,”Parker-Holder 说。

Jonathan Herbert, director of Google Maps who started on the Street View team as an intern 12 years ago, said that Genie can’t yet create a faithful reconstruction of a street. He thinks the real breakthrough is the AI’s spatial continuity. If you turn 360 degrees, the AI correctly remembers and simulates the environment behind you. From that point on, the model can build a new environment on top of that. Google 地图总监 Jonathan Herbert(12 年前以实习生身份加入街景团队)表示,Genie 目前还无法实现对街道的忠实重建。他认为真正的突破在于 AI 的空间连续性。如果你旋转 360 度,AI 能够正确记住并模拟你身后的环境。从这一点出发,模型可以在此基础上构建新的环境。

“We have long thought about how we can build out the best and richest model of the world on top of Street View data,” Herbert said. “It’s definitely been an idea of ours to use Maps Data in new ways and for new kinds of AI research for a pretty long time.” “我们长期以来一直在思考如何基于街景数据构建出最好、最丰富的世界模型,”Herbert 说,“长期以来,利用地图数据进行新的尝试和开展新型 AI 研究一直是我们的构想。”