I built a Claude Code plugin that only keeps an "agent memory" rule if it can prove it saves tokens

I built a Claude Code plugin that only keeps an “agent memory” rule if it can prove it saves tokens

我开发了一个 Claude Code 插件:只有能证明节省 Token 的“智能体记忆”规则才会被保留

Most agent-memory setups add a rule because a model thought it sounded useful. I wanted one that has to earn its spot. token-warden watches your Claude Code sessions, distills candidate efficiency rules from the expensive ones, then benchmarks each rule on a frozen test suite (with vs without it) and only keeps it if it saves at least 2x the tokens it costs to carry around, and breaks nothing. Everything else gets evicted.

大多数智能体记忆(agent-memory)设置添加规则仅仅是因为模型觉得它听起来有用。我想要的是一种必须“凭实力”赢得席位的机制。token-warden 会监控你的 Claude Code 会话,从高成本的操作中提炼出潜在的效率规则,然后在固定的测试集上对每条规则进行基准测试(对比启用与禁用后的效果)。只有当规则节省的 Token 数量至少是其自身消耗量的 2 倍,且不破坏任何功能时,它才会被保留。其余所有规则都会被剔除。

It once threw out a rule that saved 38k tokens per run because that “saving” came from the agent giving up and failing the task. Honest result so far: on a deliberately wasteful agent, a “grep for the symbol before reading the whole file” rule cut a session from ~67k to ~56k tokens (about 16%, roughly 3 cents/session on Sonnet, ~500x what the rule itself costs). On my already-optimized agents the same rule saves basically nothing and gets auto-evicted, which is the entire point.

它曾经剔除过一条单次运行可节省 3.8 万 Token 的规则,因为这种“节省”是以智能体放弃任务并导致失败为代价的。目前的真实测试结果是:在一个故意设置得非常浪费的智能体上,“在读取整个文件前先 grep 搜索符号”的规则将单次会话从约 6.7 万 Token 减少到了约 5.6 万 Token(节省约 16%,在 Sonnet 模型上大约节省 3 美分/会话,是规则自身成本的 500 倍)。而在我已经优化过的智能体上,同样的规则几乎无法节省任何 Token,因此会被自动剔除——这正是该插件的核心意义所在。

It refuses to keep junk just to look busy. It’s measured, not vibes. Open source, MIT. I’d genuinely like people to try to break it or tell me the numbers are wrong. https://github.com/vukkt/token-warden

它拒绝为了显得忙碌而保留垃圾规则。它是基于数据衡量的,而不是靠“感觉”。项目已开源,采用 MIT 协议。我真心希望大家能试着去“破坏”它,或者指出我的数据有误。https://github.com/vukkt/token-warden