Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation

Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation

Google 发布 Gemini 3.5 Live Translate,实现即时语音对语音翻译

Google has been chasing real-time translation for years, which it says has been one of its “pioneering machine learning experiments.” We’ve seen numerous demos on stage at Google events in the past, but you needed Google phones, earbuds, or some other specific setup. Last year, Google brought real-time translation to more users in the Translate app, and now it’s expanding availability more. 多年来,Google 一直在追求实时翻译技术,并称其为公司“开创性的机器学习实验”之一。过去我们在 Google 的发布会舞台上见过无数次演示,但此前你需要使用 Google 手机、耳机或其他特定的设备组合。去年,Google 将实时翻译功能带给了翻译应用中的更多用户,而现在,这一功能的覆盖范围正在进一步扩大。

With the release of Gemini 3.5 Live Translate, you’ll have access to instant translation in more places and with lower latency than ever before. The new AI model is part of the version 3.5 family that launched at I/O. Before today, Google had only rolled out the Flash version, but we’re expecting a Pro model to drop in the coming weeks. 随着 Gemini 3.5 Live Translate 的发布,你将能够在更多场景下体验到比以往任何时候延迟都更低的即时翻译。这款全新的 AI 模型属于 I/O 大会上推出的 3.5 系列。在今天之前,Google 仅发布了 Flash 版本,但我们预计 Pro 模型也将在未来几周内推出。

Gemini 3.5 Live Translate is a speech-to-speech model tuned to automatically detect and translate in more than 70 languages. Google says Gemini 3.5 Live Translate is fast enough to keep up with a normal conversation, following just a few seconds behind the speaker while also matching intonation, pacing, and pitch. In short, the voice sounds more like you than a generic robot. The demos, which are all being recorded under controlled conditions, do sound impressive. You won’t have to wait long to verify the model’s abilities for yourself, though. Gemini 3.5 Live Translate 是一款语音对语音模型,经过调优后可自动检测并翻译超过 70 种语言。Google 表示,Gemini 3.5 Live Translate 的速度足以跟上正常的对话节奏,仅比说话者滞后几秒钟,同时还能匹配语调、语速和音高。简而言之,翻译出的声音听起来更像真人,而不是机械的机器人。虽然这些演示是在受控条件下录制的,但效果确实令人印象深刻。不过,你无需等待太久就能亲自验证该模型的能力了。

Gemini 3.5 Live Translate is rolling out across several parts of the Google ecosystem. Developers can begin building with a public preview in the Gemini Live API or AI Studio. The model processes speech continuously and handles all the multilingual inputs automatically, saving developers from manually configuring settings. It also filters out background noise in busy environments. Gemini 3.5 Live Translate 正在 Google 生态系统的多个部分逐步推出。开发者可以通过 Gemini Live API 或 AI Studio 的公开预览版开始构建应用。该模型能够持续处理语音并自动处理所有多语言输入,无需开发者手动配置设置。此外,它还能过滤繁忙环境中的背景噪音。

Select enterprise customers will also get access to the new translation model in Google Meet starting this month in advance of a wider rollout. Google says it’s tweaking the Meet interface to bring the live translate feature to the front, too. Most notably, 3.5 Live Translate will come to the Google Translate app on both Android and iOS soon. 从本月开始,部分企业客户将率先在 Google Meet 中使用这一全新的翻译模型,随后将进行更广泛的推广。Google 表示,他们正在调整 Meet 的界面,将实时翻译功能置于显眼位置。最值得注意的是,3.5 Live Translate 即将登陆 Android 和 iOS 平台的 Google 翻译应用。

At the tail end of last year, Google began testing Gemini-based live translation in the app with any earbuds (and in the iOS app); previously, you needed to have the company’s Pixel Buds with an Android phone. The pending update will expand further with the addition of the latest 3.5 model. Not only can you use any earbuds, you don’t need earbuds at all. If you don’t have any handy, you can hold the phone up to your ear like you’re on a call to hear the spoken translation. However, this “listening mode” only works on Android at this time. 去年年底,Google 开始在应用中测试基于 Gemini 的实时翻译功能,支持任何耳机(以及 iOS 应用);此前,你必须使用该公司的 Pixel Buds 和 Android 手机。即将到来的更新将通过加入最新的 3.5 模型进一步扩展功能。你不仅可以使用任何耳机,甚至完全不需要耳机。如果你手头没有耳机,可以像接听电话一样将手机贴在耳边,听取语音翻译。不过,这种“收听模式”目前仅适用于 Android 设备。

The audio streams from Gemini 3.5 Live Translate are intended to sound lifelike even if they don’t exactly mimic the user’s voice. However, Google is still proceeding cautiously. All Gemini 3.5 Live Translate audio streams will have SynthID watermarks integrated into the waveform data. This will mark the speech as AI-generated, and there is (currently) no way to remove that. Gemini 3.5 Live Translate 的音频流旨在听起来逼真,即使它们并不完全模仿用户的声音。不过,Google 仍然保持谨慎态度。所有 Gemini 3.5 Live Translate 的音频流都将在波形数据中集成 SynthID 水印。这将标记该语音为 AI 生成,且(目前)无法移除。