Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here's What Smart Teams Do Instead.
Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here’s What Smart Teams Do Instead.
Uber 在 4 个月内烧光了全年的 AI 编码预算,聪明团队的做法是这样的
The AI coding bill just became everyone’s problem. In the last two weeks alone: Uber blew through its entire 2026 Claude Code budget by April and capped employees at $1,500/month. Gartner reported that 23% of tech leaders now spend $200-500 per developer per month on AI coding tokens alone. GitHub flipped Copilot to usage-based billing, turning a predictable $19/seat into an open-ended credit drain. Ramp’s AI Index shows the top 1% of firms spending $7,500/employee/month on AI — $90K/year per head, up 14.1% in a single month. The pattern is clear: agentic workflows burn tokens faster than any flat budget anticipated. And single-vendor lock-in makes it worse — when your only option is Opus 4.8 at $75/M output tokens, every wasted thinking loop is expensive.
AI 编码账单已成为所有人的难题。仅在过去两周内:Uber 在 4 月份就耗尽了其 2026 年全年的 Claude Code 预算,并对员工设定了每月 1,500 美元的上限。Gartner 报告称,23% 的技术主管现在仅在 AI 编码 Token 上,每位开发人员每月就要花费 200-500 美元。GitHub 将 Copilot 改为按使用量计费,将原本可预测的每席位 19 美元变成了无底洞。Ramp 的 AI 指数显示,排名前 1% 的公司在 AI 上每位员工每月花费 7,500 美元,即人均每年 9 万美元,单月增长了 14.1%。模式很明确:智能体工作流消耗 Token 的速度远超任何固定预算的预期。而单一供应商锁定让情况变得更糟——当你唯一的选择是每百万输出 Token 售价 75 美元的 Opus 4.8 时,每一次浪费的思考循环都代价高昂。
The Real Problem: Not All Tasks Need the Best Model
真正的问题:并非所有任务都需要最顶级的模型
Here’s what I learned after watching my own AI coding spend hit $10K/month earlier this year. I was sending everything to Claude Opus. Code planning? Opus. Writing unit tests? Opus. Formatting a config file? Opus. Renaming a variable across three files? Opus. That’s like hiring a senior architect to move furniture. The work gets done, but you’re massively overpaying.
在今年早些时候看到我的 AI 编码支出达到每月 1 万美元后,我学到了这一点。我当时把所有任务都发给 Claude Opus。代码规划?Opus。编写单元测试?Opus。格式化配置文件?Opus。在三个文件中重命名一个变量?Opus。这就像雇佣一位资深建筑师来搬家具。工作虽然完成了,但你付出的代价太高了。
When I actually profiled my usage, the breakdown looked like this: ~15% of tasks genuinely needed frontier reasoning (complex architecture decisions, subtle bug diagnosis, multi-file refactors with tricky dependencies) ~25% of tasks needed solid mid-tier capability (implementing features from clear specs, writing meaningful tests, code review) ~60% of tasks were mechanical (formatting, renaming, boilerplate generation, simple file operations, documentation updates)
当我真正分析我的使用情况时,细分结果如下: 约 15% 的任务确实需要前沿推理能力(复杂的架构决策、微妙的 Bug 诊断、涉及复杂依赖的多文件重构); 约 25% 的任务需要扎实的中端能力(根据明确规范实现功能、编写有意义的测试、代码审查); 约 60% 的任务是机械性的(格式化、重命名、生成样板代码、简单的文件操作、文档更新)。
That 60% was burning frontier-tier tokens for work that Haiku, Gemini Flash, or even a local model could handle identically. 那 60% 的任务在消耗前沿级模型的 Token,而这些工作其实 Haiku、Gemini Flash 甚至本地模型都能处理得一样好。
Task-Level Routing: The Boring Fix That Saves 60-70%
任务级路由:节省 60-70% 成本的“无聊”方案
The concept is simple: instead of routing every request to one model, classify each task and send it to the cheapest model that can handle it well. 概念很简单:不要将每个请求都路由到同一个模型,而是对每个任务进行分类,并将其发送给能够胜任该任务的最便宜模型。
-
Planning phase → Frontier model (Opus, GPT-5). This is where reasoning depth matters. You want the model that catches edge cases your spec missed.
-
Implementation → Mid-tier model (Sonnet, GPT-4.1). Given a clear plan, most code generation doesn’t need maximum intelligence — it needs reliable instruction-following.
-
Tests, formatting, docs → Fast/cheap model (Haiku, Flash, Gemini 2.5). These tasks have objectively verifiable outputs. Either the test passes or it doesn’t. You don’t need 200 IQ for assertEqual.
-
Debug/diagnosis → Frontier model again. When something breaks in a non-obvious way, you want the best reasoning available.
-
规划阶段 → 前沿模型 (Opus, GPT-5)。 这是推理深度发挥作用的地方。你需要一个能捕捉到你规范中遗漏的边缘情况的模型。
-
实现阶段 → 中端模型 (Sonnet, GPT-4.1)。 有了明确的计划,大多数代码生成不需要最高智力,它需要的是可靠的指令遵循能力。
-
测试、格式化、文档 → 快速/廉价模型 (Haiku, Flash, Gemini 2.5)。 这些任务有客观可验证的输出。测试要么通过,要么不通过。你不需要 200 的智商来写
assertEqual。 -
调试/诊断 → 再次使用前沿模型。 当出现非显而易见的故障时,你需要最强的推理能力。
After implementing this approach, my monthly spend dropped from ~$10K to ~$3K. Same output quality. Same velocity. Just stopped overpaying for routine work. 实施这种方法后,我的月支出从约 1 万美元降至约 3 千美元。输出质量相同,开发速度相同,只是不再为常规工作支付过高的费用。
How to Actually Do This
如何具体实施
You don’t need custom infrastructure. Here’s the practical version: 你不需要定制基础设施。以下是实操建议:
-
Audit Your Token Usage: Before optimizing, know where your tokens go. Log the actual prompts hitting the API for a week. You’ll probably find: Context bloat, unnecessary thinking loops, and repeated system prompts.
-
Create Task Categories: Start simple — three tiers is enough: Tier 1 (Frontier), Tier 2 (Mid), Tier 3 (Fast).
-
Route Based on the Task, Not the Session: The key insight: routing should happen at the task level. A single coding session might need Opus for design, Sonnet for implementation, and Haiku for tests.
-
Monitor and Adjust: Track cost-per-task, not just total spend. When you see a Tier 3 task consuming $2 worth of tokens on a frontier model, that’s a routing failure.
-
审计 Token 使用情况: 在优化之前,先了解 Token 的去向。记录一周内发送到 API 的实际 Prompt。你可能会发现:上下文臃肿、不必要的思考循环以及重复的系统提示词。
-
创建任务类别: 从简单开始,三个层级就足够了:一级(前沿)、二级(中端)、三级(快速)。
-
基于任务而非会话进行路由: 关键洞察:路由应该发生在任务级别。同一个编码会话可能需要 Opus 进行设计,Sonnet 进行实现,Haiku 进行测试。
-
监控与调整: 跟踪单任务成本,而不仅仅是总支出。当你发现一个三级任务在顶级模型上消耗了 2 美元的 Token 时,那就是路由失败。
The Bigger Picture
更宏观的视角
Ramp’s data tells an interesting story: the companies spending the most on AI aren’t the ones in trouble. The ones in trouble are companies locked into a single vendor with no ability to route. “The top 1% of firms tend to mix and match, bouncing between multiple frontier models and platforms that give them access to cheaper models.” — Ramp AI Index. Ramp 的数据揭示了一个有趣的现象:在 AI 上花费最多的公司并不是最危险的。真正危险的是那些被锁定在单一供应商且无法进行路由的公司。“排名前 1% 的公司倾向于混合搭配,在多个前沿模型和能够提供更廉价模型的平台之间切换。”——Ramp AI 指数。
This isn’t about spending less on AI. It’s about spending smarter. The teams that figure out task-level routing now will have a structural cost advantage as agentic workflows become the default. 这不仅仅是为了在 AI 上少花钱,而是为了更聪明地花钱。随着智能体工作流成为主流,现在就掌握任务级路由的团队将拥有结构性的成本优势。