I Cloned Myself With Gemini’s AI Avatar Tool. The Result Was Unnervingly Me
I Cloned Myself With Gemini’s AI Avatar Tool. The Result Was Unnervingly Me
我用 Gemini 的 AI 头像工具克隆了自己,结果令人不安地像我
It’s a beautiful, balmy afternoon at Dolores Park in San Francisco, and I’m singing a birthday song to a prehistoric dinosaur. A cupcake with a pink candle magically appears in my empty hand as I finish my serenade. When I blow out the flame, a calm look of contentment washes over the CGI-esque creature.
旧金山多洛雷斯公园(Dolores Park)的午后温暖宜人,我正在为一只史前恐龙唱生日歌。当我唱完小夜曲时,一个插着粉色蜡烛的纸杯蛋糕神奇地出现在我空空的手中。当我吹灭烛火时,那个 CGI 风格的生物脸上浮现出一抹平静的满足感。
While the man in this AI video looks and sounds just like me, the clip was actually generated using one of the new features available in Google’s Gemini app: avatars. These digital recreations are similar to the core features of OpenAI’s now-defunct Sora app. It’s a digital clone of you that can be inserted into AI videos. Avatars are powered by the company’s new Omni video model, and the feature is only available to subscribers.
虽然这段 AI 视频中的男人看起来和听起来都和我一模一样,但这段短片实际上是使用 Google Gemini 应用中的一项新功能生成的:头像(avatars)。这些数字重现与 OpenAI 现已停用的 Sora 应用的核心功能类似。它是一个可以被植入 AI 视频中的你的数字克隆体。头像功能由该公司新的 Omni 视频模型驱动,且仅向订阅用户开放。
I pay $20 a month for Google’s AI Pro plan and quickly maxed out Gemini’s usage limits, which reset every five hours. I simply asked a few questions and generated two 10-second clips featuring my avatar before I was told to wait until later.
我每月支付 20 美元订阅 Google 的 AI Pro 计划,但很快就用完了 Gemini 的使用限额(每五小时重置一次)。我只是问了几个问题,并生成了两个 10 秒钟的头像短片,之后就被告知需要稍后再试。
My first two glimpses of what Omni can do with my likeness were of me singing to a dino in San Francisco and surfing under the Golden Gate Bridge. I was simultaneously impressed and freaked out. The content was cringeworthy, with some jumbled moments and nonsensical outfits, but that man in the video was me. I used my fingers to zoom in on its face and really watch the mouth move. The teeth were a bit off, but otherwise that’s Reece, right on down to the chin fat.
我第一次看到 Omni 利用我的肖像所做的尝试,是我在旧金山为恐龙唱歌,以及在金门大桥下冲浪。我既感到震撼又感到毛骨悚然。内容有些令人尴尬,夹杂着一些混乱的瞬间和荒谬的服装,但视频里的那个人确实是我。我用手指放大它的脸,仔细观察嘴部的动作。牙齿稍微有点不对劲,但除此之外,那就是 Reece 本人,连下巴上的赘肉都一模一样。
Unlike OpenAI, which previously let users decide whether they wanted others to generate AI videos using their likeness, Google only lets adult users make videos with their own avatar.
与 OpenAI 不同(后者此前允许用户决定是否希望他人使用自己的肖像生成 AI 视频),Google 只允许成年用户使用自己的头像制作视频。
It took me about five minutes to set up my avatar through the Gemini app. The process involved sitting in a well-lit room with my phone’s camera pointed at my face and reading a string of two-digit numbers. Then I slowly looked to the right and swiveled my head to the left, and it was all over. Reece 2.0 was born and ready to be my deepfake star. (Be mindful of what you’re wearing during this process, since your fit will likely show up in the AI generations, but more on that later.)
通过 Gemini 应用设置我的头像大约花了五分钟。过程包括坐在光线充足的房间里,将手机摄像头对准我的脸,并朗读一串两位数。然后我慢慢向右看,再向左转头,一切就完成了。Reece 2.0 就此诞生,准备好成为我的深度伪造(deepfake)明星了。(在这个过程中要注意你的穿着,因为你的装束很可能会出现在 AI 生成的内容中,稍后我会详细说明。)
Let’s break down the birthday clip frame by frame to really unpack my feelings here. Full prompt: Generate a video of me singing the happy birthday song to an aging dinosaur at the top of the hill at Dolores Park.
让我们逐帧分析这段生日短片,来深入剖析我的感受。完整提示词:生成一段视频,内容是我在多洛雷斯公园山顶为一只年迈的恐龙唱生日歌。
The first second starts with a millennial pause, because even AI Reece has some ingrained habits. What’s most striking initially is the photorealistic setting. Rather than placing my avatar on some oversized hill at a random park, the background of Google’s AI video is remarkably similar to the actual location. From the palm-tree-lined sidewalks to the looming Salesforce tower in the distance, it’s immediately evident which park is depicted here, even though the output isn’t perfect. It makes sense that a company known for mapping the planet could pull this off.
第一秒开始于一个“千禧一代式的停顿”(millennial pause),因为即使是 AI Reece 也有一些根深蒂固的习惯。最初最引人注目的是照片级的真实场景。Google 的 AI 视频背景并没有把我的头像放在某个随机公园的超大山丘上,而是与实际地点惊人地相似。从棕榈树成荫的人行道到远处隐约可见的 Salesforce 大厦,即使输出结果并不完美,也能一眼看出这里描绘的是哪个公园。对于一家以绘制全球地图而闻名的公司来说,能做到这一点并不奇怪。
As AI me started to sing, with a less pitchy baritone than I can actually pull off, the first few bars seemed natural. I bounced my hands up and down on the beat, like a mini conductor. Then, I stutter on the word “to,” and Gemini cuts to a wider-angle shot as the real chaos begins. A vanilla cupcake appears randomly, and I exhale a cloud of smoke to blow out the celebration candle. (Honestly, how rude of AI Reece. It’s not your special day.)
当 AI 版的我开始唱歌时,音色比我本人的男中音更平稳,前几小节看起来很自然。我随着节拍上下摆动双手,就像一个小指挥家。然后,我在唱到“to”这个词时结巴了一下,Gemini 切换到了广角镜头,真正的混乱开始了。一个香草纸杯蛋糕随机出现,我呼出一团烟雾吹灭了庆生蜡烛。(老实说,AI Reece 太没礼貌了。这又不是你的特别日子。)
The other AI clip I generated using the avatar feature also blended chaotic moments with lifelike shots of me talking to the camera. Full prompt: Generate a video of me surfing beneath the Golden Gate Bridge.
我使用头像功能生成的另一个 AI 短片,同样融合了混乱的瞬间和对着镜头说话的逼真镜头。完整提示词:生成一段我在金门大桥下冲浪的视频。
Instead of putting me in a wetsuit, I was wearing head-to-toe denim. No shoes on the surfboard, at least, I guess. This AI generation included shots that looked as if they were captured on a GoPro attached to the surfboard.
它没有给我穿上潜水服,而是让我穿了一身牛仔装。至少冲浪板上没穿鞋,我想。这段 AI 生成的视频中包含了一些镜头,看起来就像是用安装在冲浪板上的 GoPro 拍摄的一样。
As more people use generative AI, especially models without strict guardrails, these tools are being used increasingly to target women with nonconsensual deepfakes. Google claims it has safety at the forefront as it rolls out this new feature. “We try to prevent harm,” says Nicole Brichtova, who leads the product team working on Omni at Google DeepMind. “And, we try to do it in a way where we’re not blocking benign things.”
随着越来越多的人使用生成式 AI,尤其是那些没有严格防护措施的模型,这些工具正越来越多地被用于针对女性制作非自愿的深度伪造内容。Google 声称在推出这项新功能时,将安全性放在了首位。Google DeepMind 负责 Omni 产品团队的 Nicole Brichtova 表示:“我们努力防止伤害,同时也努力确保不会阻碍良性的应用。”
Despite the stuttering and other errors in the clips of AI Reece, these hyper-realized versions of myself felt more real than when I listen back to a voicemail or rewatch a clip of a fun weekend out. The avatar didn’t necessarily look like a hotter version of myself, no, it was something eerier. My digital clone was seamless Reece. Always ready to be anywhere, to do anything, to be me.
尽管 AI Reece 的短片中存在结巴和其他错误,但这些超现实版本的我,感觉比我回听语音留言或重看周末游玩的视频时更真实。这个头像并不一定看起来比我本人更帅,不,它有一种更令人毛骨悚然的感觉。我的数字克隆体是一个无缝衔接的 Reece。随时准备出现在任何地方,做任何事,成为我。