Give Your AI Unlimited Updated Context

Give Your AI Unlimited Updated Context

为你的 AI 提供无限且实时更新的上下文

The architecture behind a portable knowledge layer and the automation that keeps it alive. 这是一种便携式知识层背后的架构,以及维持其活力的自动化机制。

Sara Nobrega | May 7, 2026 | 10 min read Sara Nobrega | 2026年5月7日 | 阅读需 10 分钟

Andrej Karpathy (founding member of OpenAI) posted a GitHub gist earlier this year. It’s called “LLM Wiki.” About 1,500 words. It describes a pattern where you build a personal wiki that an LLM maintains for you: a persistent, compounding artifact that gets richer every time you add to it. Knowledge compiled once and kept current, rather than re-derived from scratch on every query. OpenAI 创始成员 Andrej Karpathy 今年早些时候发布了一个 GitHub Gist,名为“LLM Wiki”。全文约 1500 字。它描述了一种模式:你构建一个由 LLM 为你维护的个人维基——这是一个持久的、不断积累的产物,每当你添加新内容时,它就会变得更加丰富。知识只需编译一次并保持更新,而不是在每次查询时都从零开始重新推导。

Most people probably read it, thought “that’s interesting,” and closed the tab! I built it and this article shows how to set it up and I also tell you what I learned during implementation. 大多数人可能读完后觉得“挺有意思”,然后就关掉了标签页!我亲自搭建了这个系统,本文将展示如何进行设置,并分享我在实施过程中的心得。

The problem with how most people use AI right now

目前大多数人使用 AI 的问题所在

Every conversation starts blank. You open a chat, explain who you are, what you’re working on, what you decided last week. You get a useful response. You close the tab. Tomorrow you do it again. 每一次对话都是从空白开始的。你打开聊天窗口,解释你是谁、正在做什么、上周做了什么决定。你得到了有用的回复,然后关掉标签页。第二天,你又得重复一遍。

The tool works fine, but the context layer underneath it is missing! It is true that built-in memory helps a little. Claude remembers your name and job title. ChatGPT knows you prefer bullet points. But neither knows the details about your active projects, the deal you’re about to close, the vendor you ruled out last month, or what happened in your pipeline this week. That kind of operational state doesn’t live anywhere persistent! 工具本身运行良好,但底层的上下文层缺失了!诚然,内置记忆功能确实有一点帮助。Claude 能记住你的名字和职位,ChatGPT 知道你喜欢用项目符号。但它们都不知道你当前项目的细节、你即将达成的交易、你上个月排除的供应商,或者本周你的业务流程中发生了什么。这类运营状态并没有存储在任何持久化的地方!

The option most engineers reach for next is RAG. RAG is genuinely useful, but it’s solving a different problem. It re-derives knowledge from scratch on every query. You embed documents, retrieve chunks at query time, and hope the right fragments surface. Nothing accumulates. A question that requires synthesising five documents means the LLM has to find and reassemble those fragments every single time. 大多数工程师接下来会选择 RAG(检索增强生成)。RAG 确实很有用,但它解决的是另一个问题。它在每次查询时都从零开始重新推导知识。你对文档进行嵌入(Embedding),在查询时检索片段,并希望正确的片段能浮现出来。没有任何东西会积累。一个需要综合五份文档的问题,意味着 LLM 每次都必须重新查找并重组这些片段。

The vault approach of this article compiles knowledge once and keeps it current. When you add something new, the LLM indexes it, reads it, integrates it, updates related pages, flags contradictions and maintains cross-references. The synthesis is already done before you ask your next question. Karpathy puts it cleanly: the wiki is a persistent, compounding artifact. The cross-references are already there. The analysis doesn’t disappear into chat history. It builds. 本文提到的“知识库(Vault)”方法将知识编译一次并保持更新。当你添加新内容时,LLM 会对其进行索引、阅读、整合、更新相关页面、标记矛盾之处并维护交叉引用。在你提出下一个问题之前,综合工作就已经完成了。Karpathy 说得很清楚:这个维基是一个持久的、不断积累的产物。交叉引用已经存在,分析结果不会消失在聊天记录中,而是不断构建。

The architecture: two folders and a schema file

架构:两个文件夹和一个模式文件

The core structure fits in a single directory tree: 核心结构适合放在一个目录树中:

vault/
├── CLAUDE.md ← schema file, entry point for any AI
├── Raw/ ← immutable source documents
│   ├── Meeting Notes/
│   ├── Documents/
│   └── _pending.md ← compilation queue
└── Wiki/ ← LLM-generated, structured, indexed
    ├── Projects/
    ├── People/
    ├── Decisions/
    ├── _hot.md ← active cache
    ├── _log.md ← audit trail
    └── _index.md ← master index

Raw is your source of truth. Meeting transcripts, exported Slack threads, documents pulled from wherever your work actually happens. The rule is absolute: the AI reads Raw, never edits it. Append-only. Raw 是你的事实来源。会议记录、导出的 Slack 讨论串、从你实际工作的地方提取的文档。规则是绝对的:AI 只读取 Raw,从不编辑它。只允许追加。

Wiki is what the AI builds and maintains. One file per project, person, decision, or domain area. Structured, cross-referenced. This is what the AI reads first when you ask a question. Wiki 是 AI 构建和维护的内容。每个项目、人员、决策或领域对应一个文件。结构化、交叉引用。这是你提问时 AI 首先读取的内容。

If you’ve worked with data pipelines, this split is familiar. Raw is your landing zone. Wiki is your curated layer. If Wiki drifts or gets corrupted, you rebuild from Raw. You never lose the source. 如果你从事过数据流水线工作,这种划分就很熟悉了。Raw 是你的着陆区,Wiki 是你的精选层。如果 Wiki 出现偏差或损坏,你可以从 Raw 重建。你永远不会丢失源数据。

The schema file sits at the root and tells any AI how the vault is organised, what to read first, and what the operating rules are. I call it CLAUDE.md. If you’re using Codex, AGENTS.md works. Name it anything, as long as you point the AI to it at the start of every session. 模式文件位于根目录,告诉 AI 知识库是如何组织的、首先读取什么以及操作规则是什么。我称之为 CLAUDE.md。如果你使用 Codex,AGENTS.md 也可以。叫什么名字都行,只要你在每次会话开始时引导 AI 读取它即可。

The three control files

三个控制文件

This is the part most implementations skip, and it’s why most implementations quietly die. A folder of markdown files is not a system. These three files make it one. 这是大多数实现方案都会跳过的部分,也是为什么大多数方案最终悄无声息地失效的原因。一堆 Markdown 文件并不是一个系统,这三个文件才让它成为一个系统。

  • _hot.md is the cache. Every morning, the daily automation rewrites this file with the most active threads, any key numbers or deadlines that surfaced, and one line on anything urgent. It stays under 500 tokens. When you open a conversation and want a fast briefing, the AI reads _hot.md first, no need to load the full Wiki. _hot.md 是缓存。 每天早上,自动化程序会重写此文件,包含最活跃的讨论串、出现的关键数字或截止日期,以及一行紧急事项。它保持在 500 token 以内。当你打开对话想要快速简报时,AI 会先读取 _hot.md,无需加载整个 Wiki。

  • _pending.md is the queue. Every time a new file lands in Raw, its filename and date get appended here. When the weekly compilation runs, it reads this file, processes each entry, compiles it into Wiki, and marks it [COMPILED — 2026-05-01]. Without this file, the daily ingest and the weekly compilation can’t coordinate. You get orphaned raw files and a Wiki that’s weeks behind. _pending.md 是队列。 每当有新文件进入 Raw,其文件名和日期就会追加到这里。每周编译运行时,它会读取此文件,处理每个条目,将其编译到 Wiki 中,并标记为 [COMPILED — 2026-05-01]。没有这个文件,日常摄入和每周编译就无法协调。你会得到孤立的原始文件,而 Wiki 会落后数周。

  • _log.md is the audit trail. Every automated run appends a timestamped entry: what ran, what files were processed, what Wiki pages were created or updated. If the system drifts, this is how you find where. Karpathy’s gist has a useful tip here: start each log entry with a consistent prefix like ## [2026-05-01] daily-ingest so the whole log is grep-parseable with basic unix tools. _log.md 是审计追踪。 每次自动化运行都会追加一个带时间戳的条目:运行了什么、处理了哪些文件、创建或更新了哪些 Wiki 页面。如果系统出现偏差,这就是你查找原因的地方。Karpathy 的 Gist 中有一个有用的提示:每个日志条目都以一致的前缀开头,例如 ## [2026-05-01] daily-ingest,这样整个日志就可以用基本的 Unix 工具进行 grep 解析。

A vault without these files accumulates dust. With them, you have a working pipeline. 没有这些文件的知识库只会积灰。有了它们,你就拥有了一个可运行的流水线。

The schema file: teaching any AI how to read your vault

模式文件:教 AI 如何读取你的知识库

CLAUDE.md is the entry point. Every session starts here. What goes in it: CLAUDE.md 是入口点。每次会话都从这里开始。其中包含:

  • The folder map (what’s in Raw, what’s in Wiki, what each subdirectory is for) 文件夹映射(Raw 里有什么,Wiki 里有什么,每个子目录的用途)
  • Read order (_hot.md always first, then the relevant domain index) 读取顺序(_hot.md 始终优先,然后是相关的领域索引)
  • Hard rules: “never edit files in Raw/”, “never invent facts not present in source files”, “always append to _log.md after every run” 硬性规则:“从不编辑 Raw/ 中的文件”、“从不编造源文件中不存在的事实”、“每次运行后务必追加到 _log.md”
  • Domain structure (which indexes exist, how they’re named) 领域结构(存在哪些索引,它们是如何命名的)

The schema file is also where you encode your prompting defaults. I use a very known pattern, adapted directly into the schema: I want to [TASK] so that [WHAT SUCCESS LOOKS LIKE]. First, read the uploaded files completely before responding. DO NOT start executing yet. Ask me… 模式文件也是你编码提示词默认设置的地方。我使用了一种非常成熟的模式,并直接将其适配到模式文件中:我想要 [任务],以便 [成功的样子]。首先,在回复之前完整阅读上传的文件。不要立即开始执行。问我……