Local AI needs to be the norm

本地 AI 应当成为常态

One of the current trends in modern software is for developers to slap an API call to OpenAI or Anthropic for features within their app. Reasonable people can quibble with whether those features are actually bringing value to users, but what I want to discuss is the fundamental concept of taking on a dependency to a cloud hosted AI model for applications. This laziness is creating a generation of software that is fragile, invades your privacy, and is fundamentally broken. We are building applications that stop working the moment the server crashes or a credit card expires.

现代软件开发的一个趋势是，开发者直接在应用中调用 OpenAI 或 Anthropic 的 API 来实现功能。人们或许会争论这些功能是否真的为用户带来了价值，但我想讨论的是：为应用程序引入云端 AI 模型作为依赖项这一基本概念。这种懒惰正在创造出一代脆弱、侵犯隐私且本质上存在缺陷的软件。我们正在构建的应用，一旦服务器宕机或信用卡过期，就会立刻停止工作。

We need to return to a habit of building software where our local devices do the work. The silicon in our pocket is mind-bogglingly faster than what was available a decade ago. It has a dedicated Neural Engine sitting there, mostly idle, while we wait for a JSON response from a server farm in Virginia. That’s ridiculous. Even if your intentions are pure, the moment you stream user content to a third-party AI provider, you’ve changed the nature of your product. You now have data retention questions and all the baggage that comes with that (consent, audit, breach, government request, training, etc.).

我们需要回归到让本地设备处理任务的软件开发习惯。我们口袋里的芯片速度比十年前快得惊人。它拥有一个专门的神经网络引擎（Neural Engine），却大部分时间处于闲置状态，而我们却在等待弗吉尼亚州服务器集群返回的 JSON 响应。这太荒谬了。即使你的初衷是好的，但一旦你将用户内容传输给第三方 AI 提供商，你就改变了产品的本质。你现在面临着数据留存问题以及随之而来的所有包袱（同意、审计、泄露、政府请求、模型训练等）。

On top of that, you also substantially complicated your stack because your feature now depends on network conditions, external vendor uptime, rate limits, account billing, and your own backend health. Congratulations! You took a UX feature and turned it into a distributed system that costs you money. If the feature can be done locally, opting into this mess is self-inflicted damage. “AI everywhere” is not the goal. Useful software is the goal.

此外，你还极大地复杂化了技术栈，因为你的功能现在依赖于网络状况、外部供应商的正常运行时间、速率限制、账户计费以及你自己的后端健康状况。恭喜你！你把一个用户体验功能变成了一个需要你付费的分布式系统。如果该功能可以在本地完成，那么选择这种混乱局面就是自找麻烦。“AI 无处不在”不是目标，有用的软件才是。

Concrete Example: Brutalist Report’s On-Device Summaries

具体案例：Brutalist Report 的端侧摘要功能

Years ago I launched a fun side project named The Brutalist Report, a news aggregator service inspired by the 1990s style web. Recently, I decided to build a native iOS client for it with the design goal of ensuring it would remain a high-density news reading experience. Headlines in a stark list, a reader mode that strips the cancer that has overtaken the web, and (optionally) an “intelligence” view that generates a summary of the article.

几年前，我启动了一个有趣的副业项目，名为 The Brutalist Report，这是一个受 90 年代网页风格启发的新闻聚合服务。最近，我决定为其构建一个原生 iOS 客户端，设计目标是确保它保持高密度的新闻阅读体验。包括简洁的标题列表、剔除网页冗余信息的阅读模式，以及（可选的）生成文章摘要的“智能”视图。

Here’s the key point though: the summary is generated on-device using Apple’s local model APIs. No server detours. No prompt or user logs. No vendor account. No “we store your content for 30 days” footnotes needed. It has become so normal for folks that any AI use is happening server-side. We have a lot of work to do to turn this around as an industry. It’s not lost on me that sometimes the use-cases you have will demand the intelligence that only a cloud-hosted model can provide, but that’s not the case with every use-case you’re trying to solve. We need to be thoughtful here.

关键点在于：摘要是使用苹果的本地模型 API 在设备上生成的。没有服务器中转，没有提示词或用户日志，不需要供应商账户，也不需要“我们将存储您的内容 30 天”之类的脚注。现在人们已经习惯了所有 AI 使用都在服务器端进行。作为行业，我们需要做很多工作来扭转这种局面。我明白，有时你的用例确实需要云端模型才能提供的智能，但并非你试图解决的每一个用例都是如此。我们需要对此进行深思熟虑。

Available Tooling

可用的工具

I can only speak on the tooling available within the Apple ecosystem since that’s what I focused initial development efforts on. In the last year, Apple has invested heavily here to allow developers to make use of a built-in local AI model easily. The core flow looks roughly like this:

我只能谈论苹果生态系统内可用的工具，因为这是我最初开发工作的重点。在过去的一年里，苹果在此投入了大量资金，使开发者能够轻松利用内置的本地 AI 模型。核心流程大致如下：

import FoundationModels
let model = SystemLanguageModel.default
guard model.availability == .available else { return }
let session = LanguageModelSession { """
Provide a brutalist, information-dense summary in Markdown format.
- Use **bold** for key concepts.
- Use bullet points for facts.
- No fluff. Just facts.
""" }
let response = try await session.respond(options: .init(maximumResponseTokens: 1_000)) { articleText }
let markdown = response.content

And for longer content, we can chunk the plain text (around 10k characters per chunk), produce concise “facts only” notes per chunk, then run a second pass to combine them into a final summary. This is the kind of work local models are perfect for. The input data is already on the device (because the user is reading it). The output is lightweight. It’s fast and private. It’s okay if it’s not a superhuman PhD level intelligence because it’s summarizing the page you just loaded, not inventing world knowledge. Local AI shines when the model’s job is transforming user-owned data, not acting as a search engine for the universe.

对于更长的内容，我们可以将纯文本分块（每块约 1 万字符），为每块生成简洁的“仅事实”笔记，然后进行第二轮处理，将它们合并为最终摘要。这正是本地模型最擅长的工作。输入数据已经在设备上（因为用户正在阅读它）。输出轻量、快速且私密。即使它没有达到超人类的博士级智能也没关系，因为它只是在总结你刚刚加载的页面，而不是在编造世界知识。当模型的工作是转换用户拥有的数据，而不是充当全宇宙的搜索引擎时，本地 AI 才会大放异彩。

There are plenty of AI features that people want but don’t trust. Summarizing emails, extracting action items from notes, categorizing this document, etc. The usual cloud approach turns every one of those into a trust exercise. “Please send your data to our servers. We promise to be cool about it.” Local AI changes that. Your device already has the data. We’ll do the work right here. You don’t build trust with your users by writing a 2,000-word privacy policy. You build trust by not needing one to begin with.

人们想要很多 AI 功能，但又不敢信任它们。比如总结电子邮件、从笔记中提取待办事项、对文档进行分类等。通常的云端方案将每一个功能都变成了信任测试：“请把你的数据发送到我们的服务器，我们保证会妥善处理。”本地 AI 改变了这一点。你的设备已经拥有了数据，我们就在本地完成工作。你不需要通过撰写 2000 字的隐私政策来建立用户信任，你通过根本不需要隐私政策来建立信任。

The tooling available on the platform goes even further. One of the best moves Apple has made recently is pushing “AI output” away from unstructured blobs of text and toward typed data. Instead of “ask the model for JSON and pray”, the newer and better pattern is to define a Swift struct that represents the thing you want. Give the model guidance for each field in natural language. Ask the model to generate an instance of that type. That’s it.

该平台提供的工具甚至更进一步。苹果最近做出的最好举措之一，是推动“AI 输出”从非结构化的文本块转向类型化数据。与其“向模型索要 JSON 然后祈祷”，更新且更好的模式是定义一个代表你所需内容的 Swift 结构体。用自然语言为每个字段提供指导，然后要求模型生成该类型的实例。就是这么简单。

Conceptually, it looks like this: 概念上，它看起来像这样：

import FoundationModels
@Generable struct ArticleIntel {
    @Guide(description: "One sentence. No hype.") var tldr: String
    @Guide(description: "3–7 bullets. Facts only.") var bullets: [String]
    @Guide(description: "Comma-separated keywords.") var keywords: [String]
}
let session = LanguageModelSession()
let response = try await session.respond(
    to: "Extract structured notes from the article.",
    generating: ArticleIntel.self
) { articleText }
let intel = response.content

Now your UI doesn’t have to scrape bullet points out of Markdown or hope the model remembered your JSON schema. You get a real type with real fields, and you can render it consistently. It produces structured output your app can actually use. And it’s all running locally! This isn’t just nicer ergonomics. It’s an engineering improvement.

现在，你的 UI 不必再从 Markdown 中抓取要点，也不必祈祷模型记住了你的 JSON 模式。你得到了一个带有真实字段的真实类型，并且可以一致地渲染它。它产生的是你的应用真正可以使用的结构化输出。而且这一切都在本地运行！这不仅仅是更好的工程体验，更是一种工程上的进步。