Apple Silicon costs more than OpenRouter

Apple Silicon 的成本高于 OpenRouter

Offline Agentic Coding part 3: Apple Silicon costs more than OpenRouter. Published 2026-05-17 Apple silicon costs more than OpenRouter. At ~50-100 watts under load, and ~$0.20 per kWh, my M5 MacbookPro will cost a few cents per hour. Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity. At a few tens of tokens per second this works out to ammortized costs of ~$1.50 per million tokens. Openrouter for comparable models is 1/3rd the price and ~2x the speed.

离线智能体编程（第三部分）：Apple Silicon 的成本高于 OpenRouter。发布于 2026 年 5 月 17 日。在负载状态下，我的 M5 MacBook Pro 功耗约为 50-100 瓦，按每千瓦时（kWh）约 0.20 美元的电费计算，每小时成本仅需几美分。因设备寿命缩短而产生的加速折旧（如果有的话）成本将远高于电费。以每秒几十个 token 的速度计算，摊销成本约为每百万 token 1.50 美元。而 OpenRouter 上同类模型的价格仅为其三分之一，速度则快约两倍。

Electricity In Northern Virginia my last electricity bill worked out to $0.18 per kilowatt hour. Let’s round up to $0.20 per kWh. EIA has average residential costs for 2025 at $0.1730 per kWh in the US. At ~50-100 watts and $0.18/kWh that’s $0.009 or $0.018 per hour. $0.02 per hour. $0.48 cents per day for the electricity to be running inference at 100%.

电费方面，我在北弗吉尼亚州的上一份电费账单显示为每千瓦时 0.18 美元，我们将其四舍五入为 0.20 美元/kWh。根据美国能源信息署（EIA）的数据，2025 年美国平均居民用电成本为每千瓦时 0.1730 美元。在 50-100 瓦的功耗下，按 0.18 美元/kWh 计算，每小时电费约为 0.009 至 0.018 美元，即每小时约 0.02 美元。这意味着以 100% 负载运行推理，每天的电费约为 0.48 美元。

Hardware A 14 inch MBP with M5 Max and 64 gigs of ram is currently listed as $4299 on the apple website. 128 gigs will cost you more but 64 gigs should run a model like Gemma 4 31b, which is almost anthropic sonnet levels of performance. For cost allocation, let’s consider that this hardware will last 3, 5, or 10 years. The cost per year is $1433, $860, or $430 respectively. The hourly cost over 3, 5, and 10 years is thus: $0.16358 $0.09815 $0.04908 Depending on useful lifespan, I think 5 years is a reasonable estimate for normal use. 7 or 10 is very plausible. At maxed out inference 3 years may be a reasonable estimate as well.

硬件方面，苹果官网目前列出的 14 英寸 M5 Max MacBook Pro（64GB 内存）售价为 4299 美元。128GB 版本价格更高，但 64GB 内存足以运行像 Gemma 4 31b 这样的模型，其性能几乎达到了 Anthropic Sonnet 的水平。在成本分摊上，我们假设该硬件的使用寿命分别为 3 年、5 年或 10 年，则每年的成本分别为 1433 美元、860 美元或 430 美元。由此计算出的每小时成本分别为：0.16358 美元、0.09815 美元和 0.04908 美元。考虑到实际使用寿命，我认为 5 年是正常使用的合理估算，7 年或 10 年也非常合理。但在高强度推理负载下，3 年也是一个合理的估算值。

Tokenomics The big question is how many tokens per hour can you get out of a local model. My M5 Max testing seems to be in the 10-40 tokens per second range for a serious model like Gemma4:31b. At 10 tokens per second that’s 36000 tokens per hour. 36000 tokens per hour across our 3-10 year lifespan at $0.18 per kwh gives a price per million tokens of $1.61 to $4.79 on the high end. At 40 tokens per second that’s 144000 tokens per hour which gets you to $0.40 to $1.20 per million tokens. For apple silicon, the hardware cost dominates.

Token 经济学：核心问题在于本地模型每小时能产生多少 token。我的 M5 Max 测试显示，在运行 Gemma 4:31b 这种重量级模型时，速度大约在每秒 10-40 个 token 之间。按每秒 10 个 token 计算，每小时可产生 36,000 个 token。结合 3-10 年的使用寿命和 0.18 美元/kWh 的电费，每百万 token 的成本在 1.61 美元到 4.79 美元（高估算值）之间。如果按每秒 40 个 token 计算，每小时可产生 144,000 个 token，每百万 token 的成本则降至 0.40 美元到 1.20 美元。对于 Apple Silicon 而言，硬件成本占据了主导地位。

OpenRouter OpenRouter has Gemma4 31b at ~38-50 cents per million tokens. This means that on the optimistic side (50 watts, 40 tokens per second, and 10 years) the pro max is as cheap as openrouter. On the pessimistic side (100 watts and 3 years at 10 tokens per second) the pro max is 10x the cost. I think ~3x the cost per million tokens is likely the right number for local inference on the pro max from an accounting perspective.

OpenRouter：OpenRouter 上 Gemma 4 31b 的价格约为每百万 token 38-50 美分。这意味着在乐观情况下（50 瓦功耗、每秒 40 个 token、10 年使用寿命），Pro Max 的成本与 OpenRouter 持平。而在悲观情况下（100 瓦功耗、3 年使用寿命、每秒 10 个 token），Pro Max 的成本则是 OpenRouter 的 10 倍。从会计角度来看，我认为 Pro Max 本地推理的成本大约是 OpenRouter 的 3 倍左右。

Conclusion Speed of inference is the biggest factor here though for most cases. Local inference is slower than cloud inference. Some of the gemma 4 providers on openrouter get up to 60-70 tokens per second, which is 3-7 times faster than what I’m seeing with the pro max (~10-20 tokens per second). For a human employee with a work laptop, their salary costs are going to be ~1000x the cost of the tokens they can generate locally. Throwing money at anthropic makes more sense in this context. It’s still wild that a consumer device can run models that are close to anthropic sonnet levels of performance.

结论：在大多数情况下，推理速度是最大的影响因素。本地推理比云端推理慢。OpenRouter 上的一些 Gemma 4 提供商速度可达每秒 60-70 个 token，比我使用 Pro Max 观察到的速度（约 10-20 个 token/秒）快 3-7 倍。对于使用工作笔记本电脑的员工来说，他们的薪资成本将是本地生成 token 成本的 1000 倍左右。在这种背景下，直接付费给 Anthropic 显然更划算。不过，消费级设备能够运行接近 Anthropic Sonnet 性能水平的模型，这依然令人惊叹。