Apple working to cram massive Gemini model into iPhone to power new Siri

苹果正致力于将庞大的 Gemini 模型塞入 iPhone 以驱动新版 Siri

It’s impossible to totally avoid generative AI when interacting with technology anymore, but Apple has a bit less of it. That’s not entirely by choice, though. The iPhone maker has delayed the AI-enhanced Siri multiple times since first promising it in 2024, but a deal with Google will merge the iconic assistant with Gemini later this year. As we approach the Worldwide Developers Conference, Apple has been working to bring big AI smarts to the modest processing environment of a smartphone. Apple fans may not like the outcome, though.

如今，在与科技产品交互时，完全避开生成式 AI 已是不可能的任务，但苹果在这方面的涉足相对较少。不过，这并非完全出于其自身意愿。自 2024 年首次承诺推出 AI 增强版 Siri 以来，苹果已多次推迟该计划，但通过与谷歌达成协议，这款标志性的语音助手将在今年晚些时候与 Gemini 融合。随着全球开发者大会（WWDC）的临近，苹果一直致力于将强大的 AI 智能引入智能手机这种处理能力有限的环境中。然而，苹果粉丝可能并不会喜欢最终的结果。

Apple has long crowed about the privacy value of running AI locally, but a new report suggests that despite Apple’s best efforts, the iPhone’s Gemini makeover will lean heavily on Google and Nvidia in the cloud. The Information reports that Apple’s Gemini-infused Siri will run both on-device and in the cloud, an apparent reversal of its privacy-focused preference for local AI.

长期以来，苹果一直标榜在本地运行 AI 的隐私价值，但一份新报告显示，尽管苹果做出了最大努力，iPhone 的 Gemini 改造计划仍将严重依赖谷歌和英伟达的云端支持。《The Information》报道称，苹果融合了 Gemini 的 Siri 将同时在设备端和云端运行，这显然背离了其此前对本地 AI 的隐私偏好。

With every new chip announcement, we hear about how the silicon has been optimized for AI—even Apple does this with its focus on Neural Engine upgrades. You may think from the grandiose language that smartphones are equipped to handle beefy AI models, but that’s not necessarily the case. In fact, the GPUs in most phones can process more AI tokens than the AI-focused NPUs. Components like Apple’s Neural Engine are designed for contextual, efficient AI processing. Even if phones had faster AI processing, they lack the RAM to keep enormous models in memory.

随着每一款新芯片的发布，我们都会听到关于芯片如何针对 AI 进行优化的宣传——苹果在强调其神经网络引擎（Neural Engine）升级时也是如此。你可能会从这些宏大的措辞中认为智能手机已经具备处理大型 AI 模型的能力，但事实并非一定如此。事实上，大多数手机中的 GPU 处理 AI token 的能力甚至超过了专注于 AI 的 NPU。像苹果神经网络引擎这样的组件是为情境化、高效的 AI 处理而设计的。即便手机拥有更快的 AI 处理速度，它们也缺乏足够的内存（RAM）来容纳庞大的模型。

Even the largest AI models are still middling assistants, and that makes local AI very challenging. The AI models that run on phones are physically smaller, featuring at most a few billion parameters. Compare that to Google’s latest Gemini models, which have trillions of parameters, The Information reports. On-device AI models are also “quantized” to run at lower precision, making them faster but affecting the accuracy of token generation. This all adds up to AIs that feel less smart than their cloud brethren, and even big cloud-based models can be pretty dumb sometimes.

即使是目前最大的 AI 模型，其辅助能力也只能算中等水平，这使得本地 AI 的实现极具挑战性。在手机上运行的 AI 模型体积较小，参数最多只有几十亿个。相比之下，《The Information》报道称，谷歌最新的 Gemini 模型拥有数万亿个参数。设备端 AI 模型还需要进行“量化”以在较低精度下运行，这虽然提高了速度，却影响了 token 生成的准确性。所有这些因素加在一起，导致手机端的 AI 感觉不如云端版本智能，而即便是大型云端模型，有时也会显得相当“笨拙”。

The amazing, shrinking Gemini

令人惊叹的“缩水版” Gemini

Google has versions of Gemini optimized for mobile devices, which it calls Gemini Nano. However, these are designed for powering contextual features like Magic Cue and audio summarization. Siri, on the other hand, is supposed to be a conversational assistant—you talk to it and it does things. That’s a different experience that requires a different kind of model. On Android, Google doesn’t even bother trying to do that locally. Talking to Gemini always goes straight to the cloud.

谷歌拥有针对移动设备优化的 Gemini 版本，即 Gemini Nano。然而，这些模型旨在驱动诸如 Magic Cue 和音频摘要等情境化功能。相比之下，Siri 被定位为对话式助手——你与它交谈，它执行任务。这是一种不同的体验，需要不同类型的模型。在 Android 系统上，谷歌甚至没有尝试在本地实现这一点。与 Gemini 的对话总是直接连接到云端。

After inking the Google deal, Apple apparently got to work distilling Google’s giant cloud-based Gemini models. Distillation is a process in which a small, less resource-intensive model learns to mimic a large, expensive one. With enough time, this can reliably transfer useful capabilities while pruning less important weights from the model. That may enable Siri to handle some tasks with private local compute, but a cloud component looks inevitable.

在与谷歌达成协议后，苹果显然开始着手“蒸馏”谷歌庞大的云端 Gemini 模型。“蒸馏”是一个让小型、资源消耗较少的模型学习模仿大型、昂贵模型的过程。经过足够的时间，这可以可靠地迁移有用的能力，同时剔除模型中不太重要的权重。这或许能让 Siri 通过私密的本地计算处理部分任务，但云端组件的参与似乎不可避免。

Processing users’ AI data in the cloud could be a problem for Apple. At WWDC, the company will probably promote its years of experience designing chips and how well that positions it for AI. However, The Information claims that Apple has struggled to even get Google’s massive undistilled Gemini models running on its custom Private Cloud Compute infrastructure, which is built on M-series Mac chips.

在云端处理用户的 AI 数据对苹果来说可能是一个问题。在 WWDC 上，苹果可能会宣传其多年来在芯片设计方面的经验，以及这些经验如何使其在 AI 领域占据优势。然而，《The Information》声称，苹果甚至难以让谷歌庞大的、未经蒸馏的 Gemini 模型在其基于 M 系列 Mac 芯片构建的定制化“私有云计算”（Private Cloud Compute）基础设施上运行。

When the smarter Siri rolls out, it will probably route more complex tasks to Google’s cloud infrastructure instead of Apple’s, but it won’t be running on Google TPUs. Apple has reportedly signed a deal with Nvidia to use its Confidential Computing platform for this purpose. Confidential Computing keeps data encrypted on Nvidia GPUs while it’s being processed in the cloud, which could help Apple claim it’s still sensitive to user privacy concerns. It might even retain its own Private Cloud Compute branding for the system.

当更智能的 Siri 发布时，它可能会将更复杂的任务路由到谷歌的云基础设施，而不是苹果自己的基础设施，但它不会在谷歌的 TPU 上运行。据报道，苹果已与英伟达签署协议，使用其“机密计算”（Confidential Computing）平台来实现这一目标。“机密计算”能在数据于云端处理时将其保持在英伟达 GPU 的加密状态，这有助于苹果宣称其依然重视用户的隐私关切。它甚至可能继续为其系统保留“私有云计算”的品牌名称。

The iPhone probably won’t tell you which version of Gemini is handling individual Siri requests. Device makers designing hybrid systems that rely on local and cloud-based AI like to talk about making the experience feel “seamless.” There might be clues, though. We’re all familiar with the sluggishness of big AI models, which can churn for a long time while they generate tokens. Nvidia’s fully encrypted Confidential Compute does slow processing compared to other AI options. Users may find it more noticeable when Siri has to talk to a remote server, but local AI will only get you so far when the best models can only run on multi-million-dollar servers.

iPhone 可能不会告诉你具体是哪个版本的 Gemini 在处理 Siri 的请求。设计依赖本地和云端 AI 的混合系统的设备制造商，总是喜欢谈论如何让体验感觉“无缝”。不过，还是会有一些线索。我们都熟悉大型 AI 模型的迟缓，它们在生成 token 时可能会长时间运行。与其它 AI 选项相比，英伟达的全加密“机密计算”确实会减慢处理速度。当 Siri 必须与远程服务器通信时，用户可能会更明显地感觉到这一点；但当最顶尖的模型只能在价值数百万美元的服务器上运行时，本地 AI 的能力终究是有限的。