The Token Tax Problem: How I Built a Super Memory Layer for AI Coding Assistants using LLM Wiki

The Token Tax Problem: How I Built a Super Memory Layer for AI Coding Assistants

Token 税问题:我如何为 AI 编程助手构建超级记忆层

We Solved the Wrong Problem First. When AI coding assistants arrived, we celebrated. Faster delivery. Less repetitive work. Developers doing more meaningful things. Then the invoices arrived. Token utilization had quietly become one of the fastest-growing line items in engineering costs. Every session, every agent, every code suggestion — all of it burning through context tokens. And the root cause was embarrassingly simple: we were paying for AI tools to re-learn our codebase from scratch, over and over again.

我们最初解决错了问题。当 AI 编程助手出现时,我们欢呼雀跃:交付速度更快了,重复性工作减少了,开发者可以去做更有意义的事情。然而,账单随之而来。Token 使用量悄然成为了工程成本中增长最快的项目之一。每一次会话、每一个 Agent、每一条代码建议,都在消耗大量的上下文 Token。而根本原因简单得令人尴尬:我们一直在为 AI 工具重复学习我们的代码库而买单,而且是一遍又一遍地重复。

Round One: The Obvious Fixes

第一轮:显而易见的修复方案

We started with the basics. Things that genuinely helped:

  • Context window hygiene — Being deliberate about what goes into context rather than dumping entire file trees at every agent invocation.
  • Model switching — Using faster, cheaper models for repetitive low-complexity tasks and reserving powerful models for architecture decisions and complex debugging.
  • Preprocessed context — Writing structured markdown instruction files that encode team conventions once and reuse them everywhere, instead of expecting agents to infer them from raw code.
  • Scoped agents — Purpose-built agents for specific tasks (test generation, code review, planning) rather than one general-purpose agent doing everything.

我们从基础做起,以下措施确实有所帮助:

  • 上下文窗口管理:谨慎选择放入上下文的内容,而不是在每次调用 Agent 时都倾倒整个文件树。
  • 模型切换:对重复性、低复杂度的任务使用更快、更便宜的模型,将强大的模型留给架构决策和复杂的调试工作。
  • 预处理上下文:编写结构化的 Markdown 指令文件,将团队规范一次性编码并到处复用,而不是指望 Agent 从原始代码中去推断。
  • 限定范围的 Agent:针对特定任务(如测试生成、代码审查、规划)构建专用 Agent,而不是让一个通用 Agent 处理所有事情。

These helped. But they didn’t solve the fundamental issue. Agents were still spending tokens exploring the codebase before doing any real work. We needed something closer to a cache layer.

这些方法有所帮助,但并没有解决根本问题。Agent 在进行任何实际工作之前,仍然在消耗 Token 来探索代码库。我们需要一种更接近缓存层的东西。

The Core Idea: A Super Memory Layer

核心理念:超级记忆层

The inspiration came from Andrej Karpathy’s concept of the LLM Wiki — the idea that an AI system benefits enormously from a persistent, structured knowledge index rather than re-reading raw source on every request. Think of it like CloudFront or Redis in front of your origin server. Instead of every agent making expensive round trips into raw source code, they read from a pre-built knowledge graph. That graph becomes a shared memory layer — a single source of architectural truth accessible by any AI tool: Copilot, Factory, Claude, Cursor, or whatever comes next.

灵感来自于 Andrej Karpathy 提出的“LLM Wiki”概念——即 AI 系统如果拥有一个持久化、结构化的知识索引,将获益匪浅,而不是在每次请求时都重新读取原始源码。可以把它想象成源服务器前的 CloudFront 或 Redis。与其让每个 Agent 都对原始代码进行昂贵的往返读取,不如让它们从预先构建的知识图中读取。该图表成为了一个共享记忆层——一个任何 AI 工具(如 Copilot、Factory、Claude、Cursor 或未来出现的任何工具)都可以访问的单一架构真理来源。

For the implementation, I used Graphify (github.com/safishamsi/graphify), an open-source tool that converts a codebase into a knowledge graph:

  • Nodes — functions, components, hooks, utilities
  • Edges — relationships between them (imports, calls, dependencies)
  • Output — a plain-language report, interactive visualization, and GraphRAG-ready JSON

在实现上,我使用了 Graphify (github.com/safishamsi/graphify),这是一个将代码库转换为知识图谱的开源工具:

  • 节点:函数、组件、钩子 (hooks)、工具类。
  • :它们之间的关系(导入、调用、依赖)。
  • 输出:纯文本报告、交互式可视化图表以及可用于 GraphRAG 的 JSON 数据。

The POC: Steps We Actually Followed

原型验证 (POC):我们实际执行的步骤

Step 1 — Full Codebase Attempt (Hit a Wall) First instinct: run it on the entire codebase at once. The corpus exceeded the tool’s recommended limits immediately (~900+ files). This is actually a healthy constraint — feeding an LLM a massive undifferentiated codebase produces poor graph quality anyway. Lesson: Large codebases need a per-module strategy.

第一步:尝试全量代码库(碰壁) 第一直觉是:一次性对整个代码库运行。语料库立即超过了工具的推荐限制(约 900 多个文件)。这实际上是一个健康的约束——向 LLM 输入海量且未区分的代码库,生成的图谱质量本来就很差。经验教训:大型代码库需要按模块处理的策略。

Step 2 — Module-by-Module Analysis We split the codebase by independent modules and ran the graph pipeline on each one separately. Each run was completely free — Graphify’s AST extraction is pure static analysis with zero LLM API calls.

第二步:模块化分析 我们将代码库按独立模块拆分,并分别对每个模块运行图谱流水线。每次运行都是完全免费的——Graphify 的 AST(抽象语法树)提取是纯静态分析,不需要任何 LLM API 调用。

Step 3 — Debugging the Tool Itself During a couple of runs, report generation failed due to API signature changes between Graphify versions. We patched the calls and kept moving. Lesson: Pin your open-source tooling versions. APIs shift.

第三步:调试工具本身 在几次运行中,由于 Graphify 不同版本间的 API 签名变更,报告生成失败了。我们修补了调用并继续前进。经验教训:锁定你的开源工具版本,API 是会变的。

Step 4 — Merging Module Graphs With individual module graphs ready, we wrote a merge script to combine them into a single unified knowledge graph. First attempt had a subtle bug — the script accidentally read the same module’s extract file multiple times, producing a graph full of duplicates. We caught it because all sets of nodes were identical. Fix: Rebuilt the merge from each module’s actual AST cache files, prefixing node IDs with the module name to prevent collisions.

第四步:合并模块图谱 在准备好各个模块的图谱后,我们编写了一个合并脚本,将它们组合成一个统一的知识图谱。第一次尝试有一个隐蔽的 Bug——脚本意外地多次读取了同一个模块的提取文件,导致生成的图谱充满了重复项。我们发现是因为所有节点集完全相同。修复方法:从每个模块实际的 AST 缓存文件重新构建合并,并在节点 ID 前加上模块名称作为前缀,以防止冲突。

Step 5 — Discovering the God Nodes The most valuable output wasn’t the graph itself — it was what the graph revealed. God nodes are the most connected abstractions in the codebase. The functions, utilities, and components that everything else depends on. Once you know your god nodes, you can:

  • Prioritise documentation specifically for these high-impact functions.
  • Instruct agents to proceed carefully whenever changes touch them.
  • Use them as architectural anchors in any context window.

第五步:发现“上帝节点” 最有价值的输出不是图谱本身,而是图谱所揭示的内容。“上帝节点”是代码库中连接度最高的抽象部分,即其他所有部分都依赖的函数、工具和组件。一旦你知道了上帝节点,你就可以:

  • 优先为这些高影响力的函数编写文档。
  • 指示 Agent 在任何涉及这些节点的变更时都要格外小心。
  • 在任何上下文窗口中将它们用作架构锚点。

Step 6 — Wiring the Graph into Agent Instructions We updated the agent instruction files used by each tool to point at the merged graph report as their primary architecture reference. This means any agent that loads these instructions starts with architectural knowledge already loaded — without scanning source files to build that understanding themselves. A 9KB markdown report replacing several megabytes of source scanning. Every session.

第六步:将图谱接入 Agent 指令 我们更新了每个工具使用的 Agent 指令文件,将其指向合并后的图谱报告,作为它们的主要架构参考。这意味着任何加载这些指令的 Agent 在启动时就已经具备了架构知识,而无需自行扫描源文件来建立这种理解。一份 9KB 的 Markdown 报告取代了数兆字节的源码扫描。每一次会话都是如此。

Step 7 — Running the Token Experiment To quantify the impact, we set up an A/B test. We commented out the graph instructions from all agent configuration files, then ran identical tasks in both configurations and compared token consumption. Results from the experiment will follow in a separate post.

第七步:运行 Token 实验 为了量化影响,我们设置了一个 A/B 测试。我们注释掉了所有 Agent 配置文件中的图谱指令,然后在两种配置下运行相同的任务,并比较 Token 消耗量。实验结果将在后续文章中发布。

The Three Outputs: What Agents Actually Consume

三种输出:Agent 实际消耗的内容

Every Graphify run produces three files, each serving a different consumer:

每次 Graphify 运行都会生成三个文件,分别服务于不同的消费者:

FileTypical SizeBest For
GRAPH_REPORT.md~9 KBCopilot, Cursor, any LLM reading markdown
文件典型大小最佳用途
GRAPH_REPORT.md~9 KBCopilot, Cursor, 任何读取 Markdown 的 LLM