The future of Siri, or: why private inference isn’t private enough

The future of Siri, or: why private inference isn’t private enough

Siri 的未来,或者:为什么“私密推理”还不够私密

Yesterday Apple announced a big step towards deploying real AI in their Siri ecosystem. In most ways this is good and inevitable: Siri is one of the world’s most widely-used voice agents, and it would be good if it didn’t suck. The idea that Apple would boost its capabilities with frontier models wasn’t so much a matter of if, but a question of when and who. 昨天,苹果宣布在 Siri 生态系统中部署真正 AI 的重大举措。从各方面来看,这都是一件好事且不可避免:Siri 是全球使用最广泛的语音助手之一,如果它能变得更好用,那自然是极好的。苹果通过前沿模型提升其能力已是必然,问题仅仅在于何时以及由谁来提供。

The who turns out to be Google: Apple looks like it will use some combination of Google Gemini models, combined with Google’s Confidential Inference and Apple’s own Private Cloud Compute for private hosting. These systems will process both your queries and evaluate private data from your devices. 答案是谷歌:苹果似乎将结合使用谷歌的 Gemini 模型,并辅以谷歌的“机密推理”(Confidential Inference)和苹果自家的“私有云计算”(Private Cloud Compute)来进行私密托管。这些系统不仅会处理你的查询,还会评估来自你设备的私密数据。

Apple’s marketing pitches the advantages as follows: First, since your phone already has context about you — meaning, your private information, schedules, email, text messages — an AI-enabled Siri can potentially offer more useful answers to your practical requests than external LLMs. Want to schedule a reservation for next week’s birthday party? In theory, a future Siri-AI might already know who’s coming, and what kind of cake they like. 苹果的营销宣传强调了以下优势:首先,由于你的手机已经掌握了你的背景信息——即你的个人隐私、日程安排、电子邮件和短信——AI 加持的 Siri 比外部大语言模型(LLM)能更有效地响应你的实际需求。比如想预订下周的生日派对?理论上,未来的 Siri AI 可能已经知道谁会参加,以及他们喜欢什么样的蛋糕。

Of course, what Apple calls “context” is also the raw data of your life. This is deeply private data from all of your apps, and that data can’t just be shipped to random adtech companies (or Sam Altman) for processing. Your context needs to be protected, and Apple bills itself as a privacy company. There’s some tension between these goals. 当然,苹果所谓的“背景信息”其实就是你生活的原始数据。这些数据来自你所有的应用程序,极其私密,绝不能随意发送给广告技术公司(或 Sam Altman)进行处理。你的背景信息需要受到保护,而苹果一直以隐私保护公司自居。在这两个目标之间,存在着一定的张力。

Apple has addressed this by marketing a service it calls Private Cloud Compute, or PCC. PCC was introduced in 2024 as a private model inference system that ran entirely on Apple Silicon, using a set of “trusted” hardware security modules running in Apple’s datacenters. The goal of this system is to ensure that your data never leaves Apple’s hardware: it’s encrypted from your phone to a dedicated server, and then it disappears once a response reaches your phone. 苹果通过推广名为“私有云计算”(PCC)的服务来解决这一问题。PCC 于 2024 年推出,是一个完全运行在 Apple Silicon 芯片上的私密模型推理系统,利用苹果数据中心内的一套“可信”硬件安全模块。该系统的目标是确保你的数据永远不会离开苹果的硬件:数据从手机传输到专用服务器的过程中全程加密,一旦响应传回手机,数据即刻销毁。

The stateless design of PCC ensures (in theory) that your data doesn’t linger, and the design of the hardware prevents even Apple from seeing the inputs. Apple has since “expanded” PCC to encompass Google’s hardware as well. I will confess that I find the details of the new “expanded” PCC just a bit vague. It sounds a lot like Apple is primarily going to rely on Google’s existing confidential compute (running in Google datacenters) to process this data, but they’re bolting on a new layer of technical security to control which models are actually running. PCC 的无状态设计(理论上)确保了数据不会残留,而硬件设计甚至能防止苹果自身查看输入内容。此后,苹果将 PCC “扩展”到了谷歌的硬件上。我必须承认,我对这个新的“扩展版”PCC 的细节感到有些模糊。听起来苹果主要是依赖谷歌现有的机密计算(在谷歌数据中心运行)来处理这些数据,但他们增加了一层新的技术安全防护,以控制具体运行哪些模型。

In any case: security experts can argue about whether this is good enough to keep Cozy Bear away from your data. What I will grant is that it’s probably good enough to keep Google and Apple from accessing your stuff, which is what most people are worried about in the first place. So why am I so nervous? 无论如何,安全专家们可以争论这是否足以抵御像 Cozy Bear 这样的黑客组织。但我承认,这可能足以防止谷歌和苹果访问你的个人信息,而这正是大多数人最担心的问题。那么,我为什么还要如此紧张呢?

A brief scenario involving private agents

关于私密智能体的简短场景

To illustrate how agents might work, it’s helpful to consider an example use case. Let’s imagine that you’re planning a business dinner for six people. This involves several subtasks: You need to juggle the participants’ schedules, know when they’re in town and available to meet. You need to choose the appropriate restaurant based on menu and location. This might depend on what you know about the participants’ preferences: Mike is wildly allergic to szechuan peppercorn, for example, which rules out quite a few options. 为了说明智能体的工作原理,我们来看一个案例。假设你正在筹划一场六人的商务晚宴。这涉及几个子任务:你需要协调参与者的日程,了解他们何时在城里并有空见面;你需要根据菜单和地点选择合适的餐厅;这可能取决于你对参与者偏好的了解:例如,Mike 对花椒严重过敏,这排除了不少选择。

With these time/cuisine/location constraints in place, you’ll need to search for a restaurant that actually has a table for six in the right place. Finally, you’ll need to book the reservation, mark your calendar, and alert your attendees. In the past, this type of scheduling required a significant amount of human effort. The beauty of AI agents is that, in theory, this is exactly the sort of project that can be automated. 在确定了时间、菜系和地点限制后,你需要寻找一家在合适地点且有六人座位的餐厅。最后,你需要预订座位、标记日历并通知参会者。过去,这类安排需要耗费大量人力。AI 智能体的魅力在于,理论上,这正是可以自动化的项目类型。

The agent can first scan your recent conversations to answer the questions needed for steps (1) & (2), then it can conduct the searches described in step (3). With a nod from you, it can even author the calendar invites and text messages required to complete step (4). So what’s the problem here? 智能体可以先扫描你最近的对话,以回答步骤 (1) 和 (2) 所需的问题,然后执行步骤 (3) 中描述的搜索。在你点头确认后,它甚至可以撰写完成步骤 (4) 所需的日历邀请和短信。那么问题出在哪里呢?

The first and unsurprising observation is that being useful on these tasks requires your agent to have context, which means: relatively unrestricted access to your private data. You know about your invitees’ availability because they texted it to you. You know about Mike’s allergy because you’ve talked about it with him or jotted it down somewhere. (This could mean iMessages, email, contacts, or personal notes.) 第一个显而易见的事实是,要胜任这些任务,智能体必须拥有背景信息,这意味着:它需要相对不受限制地访问你的私密数据。你知道受邀者的空闲时间,是因为他们发短信告诉了你;你知道 Mike 的过敏情况,是因为你和他聊过或在某处记了下来。(这可能涉及 iMessage、电子邮件、联系人或个人笔记。)

Re-entering all of this data into an agent would be annoying and time consuming and the whole point of an agent is to save you time. The winning personal assistant doesn’t win just because it’s smart: it wins because it “already knows” the things you need it to know, like a personal assistant who sits next to your desk. 将所有这些数据重新输入给智能体既麻烦又耗时,而智能体的核心价值正是为了节省你的时间。胜出的个人助理不仅仅是因为它聪明,而是因为它“已经知道”你需要它知道的事情,就像坐在你办公桌旁的私人助理一样。

Allow me to dig into the details just a bit deeper. The agent might scan your messages database to learn the parameters needed to schedule your dinner. Or, in a more token-efficient system, it might read your messages continuously and store a “memory” that distills useful facts that it might need later. Both can be functionally equivalent, but one produces an artifact that may be highly sensitive. 请允许我深入探讨一下细节。智能体可能会扫描你的消息数据库,以获取安排晚餐所需的参数。或者,在一个更具 Token 效率的系统中,它可能会持续读取你的消息并存储一份“记忆”,提炼出以后可能需要的有用事实。两者在功能上可能等效,但其中一种会产生可能极其敏感的数据产物。

And keep in mind that the set of facts that might be useful is very broad. For example, Mike’s allergy is one of those facts. But there are many others. For example, the private conversation you had where you discovered that Mike was having an affair is potentially another fact that could be stored or accessed by a system. Memory or not, this data will all be within the agent’s view, and you’ll have to hope that it knows which one to operate on. 请记住,可能“有用”的事实范围非常广泛。例如,Mike 的过敏情况只是其中之一,但还有许多其他信息。例如,你发现 Mike 有婚外情的那次私密谈话,也可能成为系统存储或访问的另一个事实。无论是否有“记忆”功能,这些数据都将处于智能体的视野之内,你只能祈祷它知道该处理哪一个。

With this data at its fingertips, your agent (which is really an LLM running on a server in a data center somewhere, combined with a bunch of local state and prompting) will need to perform inference over this data, either to summarize it, or to respond to the query itself. This is where Private Cloud Compute and Confidential Inference are de… 当这些数据触手可及,你的智能体(实际上是一个运行在某处数据中心服务器上的大语言模型,结合了一堆本地状态和提示词)将需要对这些数据进行推理,无论是为了总结信息还是为了响应查询本身。这就是“私有云计算”和“机密推理”发挥作用的地方……