One Tool That Cuts Token Costs 40-80% for Claude Code, Codex, opencode, and openclaw

One Tool That Cuts Token Costs 40-80% for Claude Code, Codex, opencode, and openclaw

一款能为 Claude Code、Codex、opencode 和 openclaw 节省 40-80% Token 成本的工具

The problem isn’t your prompts. If you’re running Claude Code, Codex, opencode, or openclaw and the API bill keeps climbing, you’ve probably tried writing tighter prompts. That’s not where the waste is. 问题不在于你的提示词(Prompts)。如果你正在使用 Claude Code、Codex、opencode 或 openclaw,且 API 账单不断攀升,你可能尝试过精简提示词。但浪费的根源并不在那里。

Four structural patterns account for most of the token spend in a typical session: 在典型的会话中,大部分 Token 消耗源于以下四种结构性模式:

  • Screenshots at full resolution. The agent reads whatever images you paste or reference. A 3.3 MB screenshot from a high-DPI display lands in the model at full size. The model doesn’t need native resolution to understand what’s on screen. 全分辨率截图。 代理会读取你粘贴或引用的任何图像。一张来自高 DPI 显示器的 3.3 MB 截图会以原始尺寸进入模型。但模型并不需要原生分辨率就能理解屏幕上的内容。

  • Repeated file reads. The agent re-reads files it already touched earlier in the session. A 600-line file read three times costs 1,800 lines of tokens. There’s no built-in session memory to prevent the second or third read from running the full price. 重复读取文件。 代理会重复读取在会话早期已经处理过的文件。一个 600 行的文件被读取三次,就会消耗 1,800 行的 Token。目前没有内置的会话记忆功能来避免第二次或第三次读取时产生全额费用。

  • Compaction that loses context. When a session compacts, the summary doesn’t know which files were actively edited or which symbols mattered, so the next request starts with the wrong picture and prompts more reads. 丢失上下文的压缩。 当会话进行压缩时,摘要无法识别哪些文件被主动编辑过,或者哪些符号是关键的,因此下一次请求会从错误的背景开始,从而导致更多的读取操作。

  • Bash output floods. Every pytest, npm install, docker build, or git log dumps hundreds of lines of passing-test names, deprecation warnings, and progress bars. The model processes all of it at full token cost. Bash 输出泛滥。 每次运行 pytest、npm install、docker build 或 git log 时,都会输出数百行的测试通过名称、弃用警告和进度条。模型会以全额 Token 成本处理所有这些内容。

These compound. On a session with 10+ file reads, a few images, and a test run, you’re easily burning 3x the tokens you actually need. 这些问题会叠加。在一个包含 10 次以上文件读取、几张图片和一次测试运行的会话中,你很容易消耗掉实际所需 Token 的 3 倍。

token-goat fixes all four

token-goat 解决了上述所有四个问题

token-goat (https://github.com/DFKHelper/token-goat) is a hook daemon for Claude Code, Codex CLI, opencode, and openclaw. Install once; it handles the rest. token-goat (https://github.com/DFKHelper/token-goat) 是一个针对 Claude Code、Codex CLI、opencode 和 openclaw 的钩子守护进程。安装一次,它就能处理剩下的所有工作。

  • Image shrinking. Intercepts screenshots before they reach the model and compresses them. A 3.3 MB PNG becomes 84 KB, 97.4% smaller. 图像压缩。 在截图到达模型之前进行拦截并压缩。一张 3.3 MB 的 PNG 图片会被压缩至 84 KB,体积减小了 97.4%。

  • Session-aware read hints. Tracks every file the agent reads in the session. When it’s about to re-read one, it gets: “you read lines 1–420 of auth.py 12 minutes ago.” Most re-reads stop. 具备会话感知的读取提示。 追踪代理在会话中读取的每一个文件。当它准备再次读取某个文件时,系统会提示:“你 12 分钟前已经读取过 auth.py 的第 1-420 行。” 大多数重复读取操作因此被终止。

  • Compaction assist. Before the session compacts, a hook builds a structured manifest — edited files, accessed symbols, key reads — and injects it into the compaction context. The next request starts with the right picture. 压缩辅助。 在会话压缩之前,钩子会构建一个结构化的清单(包含已编辑文件、访问过的符号、关键读取内容),并将其注入到压缩上下文中。下一次请求将从正确的背景开始。

  • Bash output compression. Filters long-running command output before it hits the model. pytest goes from 150 passing-test lines to a failures-first view, 80–97% smaller. npm install collapses warnings by package. docker build keeps step headers and errors, drops the rest. Bash 输出压缩。 在命令输出到达模型之前进行过滤。pytest 的输出从 150 行测试通过信息精简为“优先显示失败”的视图,体积减小 80–97%。npm install 会按包合并警告信息。docker build 只保留步骤标题和错误信息,丢弃其余内容。

It’s all automated, but you can also pull individual functions instead of whole files: token-goat read "src/auth.py::login". On a 2,000-line module, that’s 85% fewer tokens than reading the full file. 这一切都是自动化的,你也可以选择只提取单个函数而不是整个文件,例如:token-goat read "src/auth.py::login"。对于一个 2,000 行的模块,这比读取整个文件节省了 85% 的 Token。

The numbers

数据表现

100K wasted tokens per session runs about $0.30. Five sessions a week is $450/year. AI coding cost reduction at that scale comes from eliminating structural waste, not from writing shorter prompts. 每个会话浪费 10 万个 Token 大约相当于 0.30 美元。每周 5 个会话,一年就是 450 美元。在这种规模下,AI 编程成本的降低来自于消除结构性浪费,而不是编写更短的提示词。

token-goat is free. 4 hours of use on my machine: 59.7 MB of data that never hit the model, 11.5 million tokens avoided. And that was just version 0.1. token-goat 是免费的。在我机器上使用 4 小时后:有 59.7 MB 的数据未进入模型,避免了 1,150 万个 Token 的消耗。而这仅仅是 0.1 版本。

Install

安装

Requires uv (https://docs.astral.sh/uv/). 需要 uv (https://docs.astral.sh/uv/)。

uv tool install token-goat token-goat install

Works with Claude Code, Codex CLI, opencode, and openclaw. Windows, Linux, WSL, and macOS. 适用于 Claude Code、Codex CLI、opencode 和 openclaw。支持 Windows、Linux、WSL 和 macOS。

https://github.com/DFKHelper/token-goat