Google’s new anything-to-anything AI model is wild
Google’s new anything-to-anything AI model is wild
谷歌全新的“万能”AI模型表现惊人
Omni sent my kid’s stuffie rafting and deepfaked me in front of the Eiffel Tower. But it’s not quite the singularity. Omni 让我的孩子的毛绒玩具去漂流,还把我“深度伪造”到了埃菲尔铁塔前。但这还算不上是技术奇点。
Last year I deepfaked my kid’s stuffed animal to make it look like his plush deer was on vacation. It was an experiment to see if I could re-create the events depicted in a Gemini ad Google was running, and I never showed the videos of Buddy the deer on his adventures to my four-year-old. But it was a revealing exercise that made me think a lot about the difference between some harmless fun with generative AI and full-on slop. Maybe that Venn diagram is a perfect circle! Maybe not. But what I know for sure is that the tools to make realistic videos are surprisingly good, requiring surprisingly little effort and know-how. And that trend is continuing hot into Gemini’s Omni era. 去年,我通过深度伪造技术,让孩子的小鹿毛绒玩具看起来像是去度假了。这原本是一项实验,旨在看看我能否重现谷歌 Gemini 广告中展示的场景。我从未给四岁的孩子看过这些小鹿 Buddy 的冒险视频,但这次经历让我深思:生成式 AI 带来的无害乐趣与纯粹的垃圾内容之间究竟有何区别?也许这两者之间的界限早已模糊不清,又或许并非如此。但我可以肯定的是,制作逼真视频的工具已经好得惊人,且几乎不需要什么专业知识或努力。而这一趋势在 Gemini 的 Omni 时代正愈演愈烈。
Omni is a new family of generative models that will allegedly one day be able to turn any kind of input — photo, video, text — into anything else. But for starters, it’s just creating video. Omni Flash is the first of these models Google has released, now available in the company’s AI video generation and editing platform, Flow. You can still use the previous model, Veo, if you want, but Omni improves on Veo in a few ways. Omni 是一个全新的生成式模型系列,据称未来有一天能够将任何类型的输入(照片、视频、文本)转换为任何其他形式。但目前,它仅限于视频创作。Omni Flash 是谷歌发布的首个此类模型,现已集成在公司的 AI 视频生成与编辑平台 Flow 中。如果你愿意,仍然可以使用之前的模型 Veo,但 Omni 在多个方面对 Veo 进行了改进。
With Omni, you can upload a video and use that along with a text prompt as the starting point for your AI-generated creation. Google also claims Omni incorporates more real-world knowledge when producing videos and can do a better job of keeping characters consistent throughout a video as a result. There was only one way to really know if those claims are true: I brought back AI Buddy to pack his little AI-generated bags for another adventure. 使用 Omni,你可以上传一段视频,并将其与文本提示词结合,作为 AI 生成创作的起点。谷歌还声称,Omni 在制作视频时融入了更多的现实世界知识,因此能更好地保持视频中角色的一致性。要验证这些说法是否属实,只有一个办法:我让 AI Buddy 回归,收拾好它那由 AI 生成的小行李,开启另一场冒险。
The results are such a mixed bag they’re baffling. Some were very good — much more consistent and true to my prompt than when I was testing out Veo five months ago. But even the best clips Omni cooked up for me still have certain AI jump scares, like when Buddy suddenly switches orientation while he’s skydiving. 结果好坏参半,令人困惑。有些片段非常出色——比我五个月前测试 Veo 时更连贯,也更符合我的提示词。但即使是 Omni 为我制作的最佳片段,依然存在一些 AI 带来的“惊吓时刻”,比如 Buddy 在跳伞时突然改变了方向。
For another video, I gave Omni some artistic freedom. “Create a montage of Buddy packing for a vacation and embarking on a cruise ship for a tropical vacation. The mood is cute and playful. Buddy packs something funny in his suitcase that comes into play later in the clip.” It had Buddy pack a jar of honey; later in the clip he reaches for it as if it’s a bottle of sunscreen. “Uh oh,” the character says as he squirts honey onto his hoof. 在另一个视频中,我给了 Omni 一些艺术创作自由:“制作一段蒙太奇视频,展示 Buddy 为度假打包行李,并登上游轮去热带度假。氛围要可爱俏皮。Buddy 在行李箱里装了一些有趣的东西,这些东西在视频后面会用到。”它让 Buddy 装了一罐蜂蜜;视频后面,它伸手去拿,动作就像在拿防晒霜。“糟糕,”角色一边说,一边把蜂蜜挤到了蹄子上。
Honestly, not a bad bit. Except that the bottle of honey constantly changes throughout the video, from a jar, to a clear squirt bottle filled with water, then back to a squeeze bottle filled with honey. And I can’t even begin to describe how the model came up with the final frame of the video — almost as if it just barfed up a bunch of elements of the sequence it just made. 老实说,这个桥段还不错。只是那瓶蜂蜜在视频中不断变换形态:从罐子变成装水的透明挤压瓶,又变回装蜂蜜的挤压瓶。我甚至无法描述模型是如何生成视频最后一帧的——看起来就像是把刚才制作的序列中的一堆元素胡乱堆砌在了一起。
You can use text-based prompts to suggest edits to your videos, and I’ll give Google credit: This works better with Omni than it did when I tested Veo 3. But the results were bad with Veo — so bad that I found it way easier to just prompt a new video from scratch every time I wanted something changed. Omni will actually take your edits on board, but the results don’t always hit. 你可以使用基于文本的提示词来建议视频修改,我得给谷歌点个赞:Omni 在这方面的表现比我测试 Veo 3 时要好。但 Veo 的效果太差了——差到我发现每次想修改时,直接从头生成一个新视频反而更容易。Omni 确实会采纳你的修改建议,但结果并不总是尽如人意。
I had it emphasize Buddy’s facial reactions in his vacation clips, and the results just wound up looking strange. It would also give Buddy antlers from time to time, which he does not have. Buddy is a baby, thank you very much. When I prompted it to remove the antlers that appeared in one scene, it obliged — and then added antlers in all the other ones. 我让它强调 Buddy 在度假片段中的面部表情,结果看起来很奇怪。它还不时给 Buddy 长出鹿角,而它本来是没有角的。Buddy 还是个宝宝,谢谢。当我提示它移除其中一个场景中出现的鹿角时,它照做了——然后却在所有其他场景中都加上了鹿角。
The thing is, none of this is free. Generating videos costs credits, varying from 15 to 40 credits based on the length of the scene and the “ingredients” you start with. One round of edits costs 40 credits. I have the $20-per-month AI Pro plan that comes with 1,000 credits each month. After around 20 clips generated with a few edits on some, I’m down to 145. If you have specific ideas about the video you want Omni to generate, you might be looking at a lot of costly back-and-forth with the model to get a video that’s close to your vision. 问题是,这一切都不是免费的。生成视频需要消耗积分,根据场景长度和初始“素材”的不同,每次消耗 15 到 40 个积分不等。一轮修改需要 40 个积分。我订阅了每月 20 美元的 AI Pro 计划,每月包含 1,000 个积分。在生成了大约 20 个片段并进行了一些修改后,我的积分只剩下 145 个了。如果你对 Omni 生成的视频有具体想法,可能需要与模型进行多次昂贵的反复调试,才能得到接近你预期的视频。
One of Omni’s purported strengths is adding AI-generated stuff to real videos, so I gave Buddy a break and deepfaked myself. Starting with a selfie video with a neutral expression, I prompted Omni to generate videos of me eating a plate of spaghetti, sitting in an airplane seat, and standing in front of the Eiffel Tower taking a bite out of a baguette. And I can genuinely say I wasn’t prepared for what I saw. Omni 的一大宣称优势是在真实视频中添加 AI 生成的内容,所以我让 Buddy 休息一下,开始深度伪造我自己。我以一段表情中性的自拍视频为起点,提示 Omni 生成我吃意大利面、坐在飞机座位上、以及站在埃菲尔铁塔前咬法棍面包的视频。我真心可以说,我完全没准备好面对所看到的一切。
There are AI tells in my deepfake videos. The clink of the fork hitting the bowl of pasta is a little too manufactured. There’s a woman in the background of the airplane video who shows up twice. But aside from those little glitches and a vaguely uncanny sense about them, they’re convincing as hell. 我的深度伪造视频中确实存在 AI 的痕迹。叉子碰到意大利面碗发出的叮当声显得过于人工合成。飞机视频的背景里有一个女人出现了两次。但除了这些小瑕疵和一种隐约的怪异感之外,它们简直令人信服。
I showed my husband the pasta clip; he knew I was testing an AI video tool but I didn’t tell him what in the scene had been generated by AI. Without knowing what was AI-generated about it, he bought that I was sitting in front of a camera eating pasta, and said that his only clue something was up was that the bowl looked unfamiliar. The pasta-eating itself looked real enough to convince my husband. A man who has looked at me in real life basically every single day for the last decade. 我给丈夫看了吃面的片段;他知道我在测试 AI 视频工具,但我没告诉他场景中哪些部分是 AI 生成的。在不知道真相的情况下,他相信我当时正坐在镜头前吃面,并说他唯一的怀疑点是那个碗看起来很陌生。吃面的过程看起来足够真实,足以说服我的丈夫——一个在过去十年里几乎每天都在现实生活中看着我的人。
My other deepfakes are varying levels of “good enough to fool people on social media.” A couple of the Eiffel Tower clips look slightly cartoonish, but one of them is convincing enough that you might need to rewatch. 我的其他深度伪造视频在不同程度上达到了“足以在社交媒体上欺骗他人”的水平。有几个埃菲尔铁塔的片段看起来略显卡通,但其中一个非常逼真,你可能需要反复观看才能分辨。