Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here's What Smart Teams Do Instead.

Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here’s What Smart Teams Do Instead.

Uber 在 4 个月内烧光了全年的 AI 编码预算，聪明团队的做法是这样的

The AI coding bill just became everyone’s problem. In the last two weeks alone: Uber blew through its entire 2026 Claude Code budget by April and capped employees at $1,500/month. Gartner reported that 23% of tech leaders now spend $200-500 per developer per month on AI coding tokens alone. GitHub flipped Copilot to usage-based billing, turning a predictable $19/seat into an open-ended credit drain. Ramp’s AI Index shows the top 1% of firms spending $7,500/employee/month on AI — $90K/year per head, up 14.1% in a single month. The pattern is clear: agentic workflows burn tokens faster than any flat budget anticipated. And single-vendor lock-in makes it worse — when your only option is Opus 4.8 at $75/M output tokens, every wasted thinking loop is expensive.

AI 编码账单已成为所有人的难题。仅在过去两周内：Uber 在 4 月份就耗尽了其 2026 年全年的 Claude Code 预算，并对员工设定了每月 1,500 美元的上限。Gartner 报告称，23% 的技术主管现在仅在 AI 编码 Token 上，每位开发人员每月就要花费 200-500 美元。GitHub 将 Copilot 改为按使用量计费，将原本可预测的每席位 19 美元变成了无底洞。Ramp 的 AI 指数显示，排名前 1% 的公司在 AI 上每位员工每月花费 7,500 美元，即人均每年 9 万美元，单月增长了 14.1%。模式很明确：智能体工作流消耗 Token 的速度远超任何固定预算的预期。而单一供应商锁定让情况变得更糟——当你唯一的选择是每百万输出 Token 售价 75 美元的 Opus 4.8 时，每一次浪费的思考循环都代价高昂。

The Real Problem: Not All Tasks Need the Best Model

真正的问题：并非所有任务都需要最顶级的模型

Here’s what I learned after watching my own AI coding spend hit $10K/month earlier this year. I was sending everything to Claude Opus. Code planning? Opus. Writing unit tests? Opus. Formatting a config file? Opus. Renaming a variable across three files? Opus. That’s like hiring a senior architect to move furniture. The work gets done, but you’re massively overpaying.

在今年早些时候看到我的 AI 编码支出达到每月 1 万美元后，我学到了这一点。我当时把所有任务都发给 Claude Opus。代码规划？Opus。编写单元测试？Opus。格式化配置文件？Opus。在三个文件中重命名一个变量？Opus。这就像雇佣一位资深建筑师来搬家具。工作虽然完成了，但你付出的代价太高了。

When I actually profiled my usage, the breakdown looked like this: ~15% of tasks genuinely needed frontier reasoning (complex architecture decisions, subtle bug diagnosis, multi-file refactors with tricky dependencies) ~25% of tasks needed solid mid-tier capability (implementing features from clear specs, writing meaningful tests, code review) ~60% of tasks were mechanical (formatting, renaming, boilerplate generation, simple file operations, documentation updates)

当我真正分析我的使用情况时，细分结果如下：约 15% 的任务确实需要前沿推理能力（复杂的架构决策、微妙的 Bug 诊断、涉及复杂依赖的多文件重构）；约 25% 的任务需要扎实的中端能力（根据明确规范实现功能、编写有意义的测试、代码审查）；约 60% 的任务是机械性的（格式化、重命名、生成样板代码、简单的文件操作、文档更新）。

That 60% was burning frontier-tier tokens for work that Haiku, Gemini Flash, or even a local model could handle identically. 那 60% 的任务在消耗前沿级模型的 Token，而这些工作其实 Haiku、Gemini Flash 甚至本地模型都能处理得一样好。

Task-Level Routing: The Boring Fix That Saves 60-70%

任务级路由：节省 60-70% 成本的“无聊”方案

The concept is simple: instead of routing every request to one model, classify each task and send it to the cheapest model that can handle it well. 概念很简单：不要将每个请求都路由到同一个模型，而是对每个任务进行分类，并将其发送给能够胜任该任务的最便宜模型。

Planning phase → Frontier model (Opus, GPT-5). This is where reasoning depth matters. You want the model that catches edge cases your spec missed.
Implementation → Mid-tier model (Sonnet, GPT-4.1). Given a clear plan, most code generation doesn’t need maximum intelligence — it needs reliable instruction-following.
Tests, formatting, docs → Fast/cheap model (Haiku, Flash, Gemini 2.5). These tasks have objectively verifiable outputs. Either the test passes or it doesn’t. You don’t need 200 IQ for assertEqual.
Debug/diagnosis → Frontier model again. When something breaks in a non-obvious way, you want the best reasoning available.
规划阶段 → 前沿模型 (Opus, GPT-5)。 这是推理深度发挥作用的地方。你需要一个能捕捉到你规范中遗漏的边缘情况的模型。
实现阶段 → 中端模型 (Sonnet, GPT-4.1)。 有了明确的计划，大多数代码生成不需要最高智力，它需要的是可靠的指令遵循能力。
测试、格式化、文档 → 快速/廉价模型 (Haiku, Flash, Gemini 2.5)。 这些任务有客观可验证的输出。测试要么通过，要么不通过。你不需要 200 的智商来写 assertEqual。
调试/诊断 → 再次使用前沿模型。 当出现非显而易见的故障时，你需要最强的推理能力。

After implementing this approach, my monthly spend dropped from ~$10K to ~$3K. Same output quality. Same velocity. Just stopped overpaying for routine work. 实施这种方法后，我的月支出从约 1 万美元降至约 3 千美元。输出质量相同，开发速度相同，只是不再为常规工作支付过高的费用。

How to Actually Do This

如何具体实施

You don’t need custom infrastructure. Here’s the practical version: 你不需要定制基础设施。以下是实操建议：

Audit Your Token Usage: Before optimizing, know where your tokens go. Log the actual prompts hitting the API for a week. You’ll probably find: Context bloat, unnecessary thinking loops, and repeated system prompts.
Create Task Categories: Start simple — three tiers is enough: Tier 1 (Frontier), Tier 2 (Mid), Tier 3 (Fast).
Route Based on the Task, Not the Session: The key insight: routing should happen at the task level. A single coding session might need Opus for design, Sonnet for implementation, and Haiku for tests.
Monitor and Adjust: Track cost-per-task, not just total spend. When you see a Tier 3 task consuming $2 worth of tokens on a frontier model, that’s a routing failure.
审计 Token 使用情况： 在优化之前，先了解 Token 的去向。记录一周内发送到 API 的实际 Prompt。你可能会发现：上下文臃肿、不必要的思考循环以及重复的系统提示词。
创建任务类别： 从简单开始，三个层级就足够了：一级（前沿）、二级（中端）、三级（快速）。
基于任务而非会话进行路由： 关键洞察：路由应该发生在任务级别。同一个编码会话可能需要 Opus 进行设计，Sonnet 进行实现，Haiku 进行测试。
监控与调整： 跟踪单任务成本，而不仅仅是总支出。当你发现一个三级任务在顶级模型上消耗了 2 美元的 Token 时，那就是路由失败。

The Bigger Picture

更宏观的视角

Ramp’s data tells an interesting story: the companies spending the most on AI aren’t the ones in trouble. The ones in trouble are companies locked into a single vendor with no ability to route. “The top 1% of firms tend to mix and match, bouncing between multiple frontier models and platforms that give them access to cheaper models.” — Ramp AI Index. Ramp 的数据揭示了一个有趣的现象：在 AI 上花费最多的公司并不是最危险的。真正危险的是那些被锁定在单一供应商且无法进行路由的公司。“排名前 1% 的公司倾向于混合搭配，在多个前沿模型和能够提供更廉价模型的平台之间切换。”——Ramp AI 指数。

This isn’t about spending less on AI. It’s about spending smarter. The teams that figure out task-level routing now will have a structural cost advantage as agentic workflows become the default. 这不仅仅是为了在 AI 上少花钱，而是为了更聪明地花钱。随着智能体工作流成为主流，现在就掌握任务级路由的团队将拥有结构性的成本优势。