Don't trust large context windows

Don’t trust large context windows

Don’t trust large context windows

不要轻信大上下文窗口

I recently watched a video that put a name on something I’d been feeling. The author splits an LLM’s context window into two zones. There’s the smart zone, where the model is sharp, and the dumb zone, where attention drops off and the model starts forgetting what you told it five minutes ago. The cutoff sits somewhere around 100k tokens. It doesn’t matter how big the advertised context window is.

我最近看了一个视频，它为我一直以来的某种感觉找到了一个准确的定义。作者将大语言模型（LLM）的上下文窗口划分为两个区域：一个是“智能区”，模型在这里表现敏锐；另一个是“愚钝区”，模型的注意力会下降，开始忘记你五分钟前告诉它的内容。这个分界线大约在 10 万 token 左右。无论厂商宣传的上下文窗口有多大，这一点都不会改变。

This matters because coding agents will happily walk you straight into the dumb zone. A modern agent burns through tokens fast. A few file reads, a long debug session, a sprawling test run, and you’re at 100k before lunch. Meanwhile vendors keep advertising windows of 200k, 1M, even 2M, as if those numbers represented a usable working set. They don’t.

这一点至关重要，因为编程智能体（Coding Agents）会毫不犹豫地把你带进“愚钝区”。现代智能体消耗 token 的速度极快。读取几个文件、进行一次长时间的调试、运行一次庞大的测试，午饭前你可能就已经用掉了 10 万 token。与此同时，厂商们还在不断宣传 20 万、100 万甚至 200 万的窗口大小，仿佛这些数字代表了可用的工作集。事实并非如此。

Studies like RULER and Chroma’s report on context rot show that effective context is a fraction of the advertised number, and that performance degrades gradually as you fill the window. Large context windows are mostly a marketing number. The architectures behind them work, but they paper over a problem the underlying attention mechanism doesn’t really solve. The number on the box gets bigger every release. The usable part doesn’t keep up.

诸如 RULER 和 Chroma 关于“上下文衰减”（context rot）的报告显示，有效的上下文只是宣传数字的一小部分，而且随着窗口被填满，性能会逐渐下降。大上下文窗口更多是一个营销数字。其背后的架构确实有效，但它们只是掩盖了底层注意力机制尚未真正解决的问题。产品包装上的数字在每次发布时都在变大，但真正可用的部分却没能跟上。

Modern agents are getting smart about this. Tools like Claude Code now auto-compact: when the session gets long, the agent summarizes the history and starts fresh. That helps. But auto-compaction kicks in after you’ve already spent time in the dumb zone, and the summary is itself produced by a model that’s already degraded. Better than nothing, but I’d rather avoid the situation altogether.

现代智能体正在变得聪明。像 Claude Code 这样的工具现在支持自动压缩：当会话变长时，智能体会总结历史记录并重新开始。这确实有帮助。但自动压缩是在你已经在“愚钝区”待了一段时间后才触发的，而且总结本身也是由一个性能已经下降的模型生成的。虽然聊胜于无，但我更倾向于完全避免这种情况。

What I do is open a new session and pass it a spec I wrote myself. That’s a much higher signal handoff than any automated summary, because I get to decide what matters going forward. It’s the breadcrumb approach applied to agents. Leave an artifact that the next session, or the next person, can pick up cleanly.

我的做法是开启一个新的会话，并向其传递我自己编写的规范文档。这比任何自动总结的信号传递效率都要高得多，因为我可以决定接下来什么才是重要的。这是一种应用于智能体的“面包屑”方法：留下一个工件（artifact），让下一个会话或下一个人能够清晰地接手。

You can take this further. Projects like obra/superpowers and mattpocock/skills structure entire agent workflows around small, named artifacts. PRDs, plans, skills, sub-agent handoffs. Each one is a way to keep the working session in the smart zone by deliberately moving information out of the session into something the next session can read.

你可以更进一步。像 obra/superpowers 和 mattpocock/skills 这样的项目，围绕小型、具名的工件来构建整个智能体工作流。无论是产品需求文档（PRD）、计划、技能还是子智能体交接，每一种方式都是为了通过刻意将信息从当前会话中移出，存入下一个会话可读取的内容中，从而将会话保持在“智能区”。

So I treat my context window like a budget. I assume only the first chunk is really working for me, and everything I can move out of the live session and into a written artifact is one less thing for attention to fight over.

因此，我把上下文窗口当作预算来管理。我假设只有最开始的那一部分真正能为我所用，而任何我能从实时会话中移出并转化为书面工件的内容，都能减少注意力机制的负担。