Introducing Gemini Omni

Introducing Gemini Omni

Introducing Gemini Omni

Gemini Omni Flash is a model that can create anything from any input – starting with video. Gemini Omni Flash 是一款能够通过任何输入(从视频开始)进行创作的模型。

Last year, Nano Banana brought Gemini’s intelligence to image generation and editing. Since then, it’s helped millions of people restore old photos, design from sketches and visualize ideas in ways that weren’t possible before. From the start we built Gemini to be natively multimodal from the ground up, and now we’re taking the next step. 去年,Nano Banana 将 Gemini 的智能引入了图像生成和编辑领域。自那时起,它已帮助数百万人修复旧照片、根据草图进行设计,并以以往无法实现的方式将创意可视化。从一开始,我们就将 Gemini 构建为原生多模态模型,而现在,我们正在迈出下一步。

We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini’s real-world knowledge. You can also easily edit your videos through conversation. 我们隆重推出 Gemini Omni,它将 Gemini 的推理能力与创作能力融为一体。Omni 是我们的全新模型,能够通过任何输入(从视频开始)进行创作。借助 Omni,您可以结合图像、音频、视频和文本作为输入,生成基于 Gemini 现实世界知识的高质量视频。您还可以通过对话轻松编辑视频。

Today, we’re rolling out the first model in the Omni family: Gemini Omni Flash, to the Gemini app, Google Flow and YouTube Shorts. In time we will support output modalities like image and audio. Here’s some of what makes Omni special: 今天,我们正式向 Gemini 应用、Google Flow 和 YouTube Shorts 推出了 Omni 系列的首款模型:Gemini Omni Flash。未来,我们将支持图像和音频等输出模态。以下是 Omni 的部分独特之处:

Edit your videos through conversation

通过对话编辑视频

Gemini Omni gives you an easier way to edit video — with natural language. Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before. Gemini Omni 为您提供了一种更简单的视频编辑方式——使用自然语言。每一条指令都建立在前一条指令的基础上。您的角色保持一致,物理规律得以维持,场景也能记住之前发生的内容。

Transform the world around you. Change specific things, or change everything. Your video becomes the starting point for something you never could have filmed yourself. 改变您周围的世界。您可以修改特定事物,也可以改变一切。您的视频将成为创作的起点,实现您自己无法拍摄出的画面。

  • Prompt: Make the sculpture out of bubbles.
  • 提示词: 将雕塑变成由气泡组成。

Reimagine the action. Take a video you shot and just ask Omni to change what’s happening. Edit the action, add in new characters or objects, or transform a moment into something unexpected. 重新构想动作。选取一段您拍摄的视频,只需让 Omni 改变正在发生的事情。编辑动作、添加新角色或物体,或者将某个瞬间转化为意想不到的场景。

  • Prompt: When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person’s arm turns into reflective mirror material.

  • 提示词: 当人触碰镜子时,让镜子像液体一样产生美丽的涟漪,并将人的手臂变成反光的镜面材质。

  • Prompt: Dim the lights in the room. Put a black and white checkerboard room inside a glass sphere that floats tracking above the hand, inside it contains a recursive representation of the same hand holding the sphere, creating an infinite recursive of rooms. Camera slowly gets closer into the sphere, creating a video loop.

  • 提示词: 调暗房间灯光。将一个黑白棋盘格房间放入一个悬浮在手掌上方的玻璃球中,球内包含同一只手拿着球的递归表现,创造出无限递归的房间。摄像机缓慢靠近球体,形成视频循环。

  • Prompt: The lights of the apartments start turning on in sync with the music.

  • 提示词: 公寓的灯光随着音乐同步亮起。

Refine your videos across multiple turns. Change the environment, angle, style or even specific details, without ever losing the thread of your original scene. Scroll through the carousel to see how edits build on each other. 通过多轮对话精修您的视频。改变环境、角度、风格甚至特定细节,同时保持原始场景的连贯性。滑动轮播图,查看编辑是如何层层递进的。

  • Prompt: A video of a violinist playing a song.
  • 提示词: 一段小提琴手演奏歌曲的视频。
  • Prompt: Transport the violinist to the image environment.
  • 提示词: 将小提琴手传送到图像环境。
  • Prompt: Make the violin invisible.
  • 提示词: 让小提琴隐形。
  • Prompt: Change the camera angle to be over the violinist’s shoulder.
  • 提示词: 将摄像机角度改为小提琴手的肩后视角。

Bring ideas to life, grounded in Gemini’s world knowledge

基于 Gemini 的世界知识,让创意变为现实

Gemini Omni doesn’t just build scenes that look real, it reasons about what should happen next. It combines an intuitive understanding of physics with Gemini’s knowledge of history, science and cultural context, bridging the gap from photorealism to meaningful storytelling. Gemini Omni 不仅仅是构建看起来真实的场景,它还能推理接下来应该发生什么。它将对物理学的直观理解与 Gemini 在历史、科学和文化背景方面的知识相结合,弥合了从照片级真实感(Photorealism)到有意义的故事叙述之间的鸿沟。

Create visuals with more accurate physics. Omni has an improved intuitive understanding of forces like gravity, kinetic energy and fluid dynamics, allowing you to create more realistic scenes. 创作物理规律更准确的视觉效果。Omni 对重力、动能和流体动力学等力的直观理解得到了提升,使您能够创作出更逼真的场景。

  • Prompt: A marble rolling fast on a chain reaction style track, continuous smooth shot.
  • 提示词: 一颗弹珠在连锁反应式的轨道上快速滚动,连续平滑镜头。

Blend knowledge and creativity. Omni draws on Gemini’s knowledge to connect language, imagery and meaning in ways that go far beyond pattern matching. 融合知识与创造力。Omni 利用 Gemini 的知识将语言、图像和意义联系起来,其方式远超简单的模式匹配。

  • Prompt: The video shows items of the alphabet. An unusual item starting with each letter is shown sitting on a table (like a Capybara for C, disco globe for D and Lava Lamp for L). All 26 letters must be represented by 26 items with matching lower thirds displaying the letter. Only one item and lower third at a time. Each lower third must look like a black marker written on a slip of paper in the bottom left. Rapid fire, roughly 9 frames per item at 24FPS. Last frame is a slip of paper “THE END”. The whole video is accompanied by calm smooth music.
  • 提示词: 视频展示字母表中的物品。每个字母开头的独特物品放在桌子上(例如 C 代表水豚,D 代表迪斯科球,L 代表熔岩灯)。所有 26 个字母必须由 26 个物品表示,并配有显示字母的下三分之一字幕。一次只显示一个物品和字幕。每个字幕看起来像是用黑色记号笔写在左下角的纸条上。快速切换,24FPS 下每个物品约 9 帧。最后一帧是一张写着“THE END”的纸条。整个视频伴随着平静柔和的音乐。

Complex ideas made visual. Omni can create compelling explainers from short prompts, generating visuals that break down more complex ideas. 将复杂概念视觉化。Omni 可以通过简短的提示词创建引人入胜的解释视频,生成的视觉效果能够拆解更复杂的概念。

  • Prompt: claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate.
  • 提示词: 关于蛋白质折叠的粘土动画解释,一切都由粘土制成,没有手,定格动画,准确。

Create videos from any combination of inputs

通过任何输入组合创建视频

Reference anything. Omni turns any reference — image, text, video or audio — into a single, cohesive output. While only voice references will be supported for audio to start, we’ll roll out other types of audio inputs soon. 参考任何内容。Omni 将任何参考资料(图像、文本、视频或音频)转化为单一、连贯的输出。虽然起初仅支持语音参考作为音频输入,但我们很快会推出其他类型的音频输入支持。

  • Prompt: Dynamic sci-fi film style video based on image_0.png. Elements light up similar to video_0.mp4 synchronized to the beat of the music from audio_0.wav.

  • 提示词: 基于 image_0.png 的动态科幻电影风格视频。元素像 video_0.mp4 一样亮起,并与 audio_0.wav 的音乐节拍同步。

  • Prompt: Referring to the extreme camera movement, perspective, and distortion in video-0, create a front-facing full-body walk cycle of the character from image-0, quickly style-shifting into multiple visual styles during the walk cycle, starting from realistic cinema. Keep the environment, only change styles. Hard cut backgrounds always centering the sky. Continuous walking, continuous audio, and style shifts in perfect sync to the beat of the audio. Cinematic, 16:9.

  • 提示词: 参考 video-0 中极端的摄像机运动、透视和畸变,为 image-0 中的角色创建一个正面全身行走循环,在行走过程中快速切换多种视觉风格,从写实电影风格开始。保持环境不变,仅改变风格。硬切背景,始终以天空为中心。连续行走,连续音频,风格切换与音频节拍完美同步。电影感,16:9。

  • Prompt: Add harp sounds synchronized to when I touch each fern leaf. Change the leaf structure to all resemble semi translucent 3d bioluminescent plant life, with bioluminescent fireflies flying around it that react as I play, in sync with the sounds, subtle bokeh depth of field dynamic lighting, reflecting off the walls in the room, keeping the room structure the same.

  • 提示词: 添加与我触碰每片蕨类叶子同步的竖琴声。将叶子结构全部改为半透明的 3D 生物发光植物,周围有生物发光的萤火虫,它们会随着我的演奏做出反应,并与声音同步,带有微妙的散景景深动态光效,反射在房间墙壁上,保持房间结构不变。

Start from what you have. With input references, you can use images of characters, scenes or drawings to create in a way that matches your vision. 从您现有的内容开始。通过输入参考,您可以使用角色、场景或绘图的图像,以符合您愿景的方式进行创作。

  • Prompt: Imagine the world gradually changing into retro futuristic style (grainy and moody as image-1) as I walk. Use the audio for a retro-futuristic.
  • 提示词: 想象当我行走时,世界逐渐变成复古未来主义风格(像 image-1 那样充满颗粒感和情绪化)。使用复古未来主义风格的音频。