RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production

RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production

RAG 对时间“视而不见”——我构建了一个时间层来解决生产环境中的问题

How I added temporal awareness and freshness tracking to a RAG system with no sense of time. 我是如何为缺乏时间感知的 RAG 系统添加时间意识和时效性追踪功能的。

Emmimal P Alexander | May 9, 2026 | 24 min read Emmimal P Alexander | 2026年5月9日 | 24分钟阅读

Three weeks into testing, a learner messaged me about a wrong answer. She had asked the tutor about a concept from one of my Generative AI tutorials. The response looked fine. But it wasn’t. I had already rewritten that content two months earlier. My RAG system pulled a version from six months ago — not obviously wrong, just wrong enough to mislead. She thought she had misunderstood. She hadn’t. My own system was teaching her from lessons I had already replaced. 在测试进行到第三周时,一位学员发消息告诉我系统给出了错误的答案。她向导师询问了我其中一篇生成式 AI 教程中的概念。回复看起来没问题,但事实并非如此。我早在两个月前就重写了那部分内容。我的 RAG 系统提取了一个六个月前的版本——它并不明显错误,但足以产生误导。她以为是自己理解错了,其实不然。我的系统正在用我已经替换掉的旧课程教她。

I’m building a RAG-powered assistant for EmiTechLogic, my tech education platform — turning a content library into a system that generates answers directly from my own articles. I wrote about the initial architecture here. The initial architecture was manageable. The real challenge begins when real learners hit a live system. 我正在为我的技术教育平台 EmiTechLogic 构建一个基于 RAG 的助手,旨在将内容库转化为一个能直接根据我的文章生成答案的系统。我曾在这里写过关于初始架构的内容。初始架构尚可应付,但真正的挑战始于真实学员开始使用实时系统时。

When I pulled the retrieval logs, I saw exactly what happened. Both versions were in the vector store. The old one ranked first because it had more matching tokens and a higher cosine similarity score. The updated version came in second. Sometimes third. I expected the newer document to win automatically. That’s not how cosine similarity works. The system was doing exactly what it was designed to do, which turned out to be the problem. 当我调取检索日志时,我清楚地看到了发生了什么。两个版本都存在于向量数据库中。旧版本排名第一,因为它有更多的匹配标记(tokens)和更高的余弦相似度得分。更新后的版本排在第二,有时是第三。我本以为较新的文档会自动胜出,但余弦相似度并非如此运作。系统完全按照其设计初衷在运行,而这恰恰就是问题所在。

The pattern held across other queries too. Python tutorials I had updated, model comparison guides I had revised. Old versions kept surfacing first. The AI tool I was building was quietly teaching people from lessons I had already replaced. 这种模式在其他查询中也同样存在。我更新过的 Python 教程、修订过的模型对比指南,旧版本总是优先出现。我正在构建的 AI 工具在悄无声息地用我已经替换掉的课程教导用户。

Here’s what that looked like in practice, same query, same corpus, naive RAG: 以下是实际操作中的表现,同样的查询,同样的语料库,朴素 RAG(Naive RAG):

QUERY: What are the API rate limits? Will I get a 429 error? 查询:API 速率限制是多少?我会收到 429 错误吗?

NAIVE RAG

  1. [policy_v1] age=540d | EXPIRED | sim=0.447 “API rate limits are set to 100 requests per minute…”
  2. [announcement_today] age=0d | valid | sim=0.329
  3. [tutorial_old] age=600d | EXPIRED | sim=0.303 朴素 RAG
  4. [policy_v1] 时长=540天 | 已过期 | 相似度=0.447 “API 速率限制设置为每分钟 100 次请求……”
  5. [今日公告] 时长=0天 | 有效 | 相似度=0.329
  6. [旧教程] 时长=600天 | 已过期 | 相似度=0.303

A 540-day-old expired document was sitting at the top. The live announcement from 48 hours ago was ranked second. The retriever didn’t care about freshness. It only matched words. I assumed freshness would be handled somewhere in the pipeline. It wasn’t. Nobody had thought to add it. 一份 540 天前过期的文档排在首位。而 48 小时前的实时公告仅排在第二。检索器并不关心时效性,它只匹配词汇。我原以为流水线中的某个环节会处理时效性,但并没有。没人想到要加上它。

This article is about how I fixed that. I built a temporal layer, a layer that sits between the vector search results and the LLM, and makes the system care about time. 本文讲述了我如何解决这个问题。我构建了一个“时间层”,它位于向量搜索结果和 LLM 之间,使系统能够关注时间因素。

TL;DR If you’re short on time: vector search has no concept of when something was true. I fixed this by adding a reranking step between the retriever and the LLM — one that hard-removes expired facts, boosts active time-bounded signals, and uses exponential decay to prefer newer documents. The tricky part was making sure “fresh” didn’t override “relevant.” The one-line version: naive RAG finds what’s similar, temporal RAG finds what’s still true. 简而言之:如果你时间紧迫,请记住:向量搜索没有“某事何时成立”的概念。我通过在检索器和 LLM 之间增加一个重排序(reranking)步骤解决了这个问题——该步骤会强制移除过期事实,提升有时效性的信号权重,并使用指数衰减来优先选择较新的文档。最棘手的部分是确保“时效性”不会凌驾于“相关性”之上。一句话总结:朴素 RAG 寻找相似的内容,而时间 RAG 寻找仍然正确的内容。

Complete code: https://github.com/Emmimal/temporal-rag/ 完整代码:https://github.com/Emmimal/temporal-rag/

Who this is for: Any RAG system where the knowledge base changes over time. If your system has ever given a confident answer from a document you had already updated, deprecated, or replaced — this is for you. It matters most for API and product documentation, incident and outage management, customer support knowledge bases, internal wikis and policy systems, and education platforms where content evolves. 适用对象:任何知识库随时间变化的 RAG 系统。如果你的系统曾根据你已经更新、弃用或替换的文档给出过自信的回答,那么这篇文章就是为你准备的。它对于 API 和产品文档、事故与故障管理、客户支持知识库、内部维基与政策系统,以及内容不断演进的教育平台尤为重要。

Skip it if your knowledge base is static and never changes. Skip it if your content has no concept of expiry, versions, or time-bounded signals. Skip it if a stale answer carries no real consequence. 如果你的知识库是静态且从不改变的,请跳过。如果你的内容没有过期、版本或有时效性信号的概念,请跳过。如果过时的答案不会带来任何实际后果,请跳过。

Why Vector Search Has No Sense of Time: The standard RAG pipeline embeds documents, embeds the query, finds the closest matches, and sends them to the model. That works fine if your information never changes. But if you are constantly publishing new guides and rewriting old ones, this fails silently. You might not even notice until a user complains. The vector store just knows the angle between the vectors. It has no idea which document is six months old and which one I published last week. 为什么向量搜索没有时间感:标准的 RAG 流水线会对文档和查询进行嵌入(embedding),找到最接近的匹配项,然后发送给模型。如果你的信息从不改变,这运行得很好。但如果你不断发布新指南并重写旧指南,这种方式就会在无声中失效。你甚至可能直到用户投诉才发现。向量数据库只知道向量之间的夹角,它根本不知道哪份文档是六个月前的,哪份是我上周发布的。

The usual fixes are deleting old documents or adding metadata filters. I tried both. They helped for about two weeks, and then I updated my content again and the same problem returned. A document with a 20% penalty can still rank first if its word overlap is strong enough. When I looked closer, I realized this wasn’t one big problem. It was actually three separate problems, and each one needs a different fix. I had been collapsing all three into one bucket called “stale content” and applying the same fix to all of them. That’s why nothing was sticking. 通常的解决方法是删除旧文档或添加元数据过滤器。我两种都试过。它们起效了大约两周,然后当我再次更新内容时,同样的问题又出现了。一份被惩罚 20% 权重的文档,如果其词汇重合度足够高,依然可以排在第一。当我深入研究时,我意识到这并不是一个大问题,而是三个独立的问题,每个问题都需要不同的修复方案。我之前把这三者混为一谈,统称为“过时内容”,并对它们应用了相同的修复方法。这就是为什么一直无法彻底解决的原因。

Three Time Problems, Three Different Fixes 三个时间问题,三种不同的修复方案

  1. Expiration: a fact that is now false. Some documents have an expiry date. Showing them after that date isn’t a freshness issue. It’s a lie. You can’t just down-rank these. You have to remove them completely before the model ever sees them.

  2. 过期:事实已不再成立。有些文档有过期日期。在日期之后展示它们不是时效性问题,而是谎言。你不能仅仅降低它们的排名,必须在模型看到它们之前将其彻底移除。

  3. Temporality: facts that are only true right now. Some information matters intensely for a short window. A live notice about a site outage or a 48-hour policy change isn’t just extra context. It is the most important document in your knowledge base while its window is open. An hour after it closes, it is false.

  4. 时效性:仅在当前成立的事实。有些信息在短时间内至关重要。关于网站故障的实时通知或 48 小时的政策变更,不仅仅是额外的背景信息。在有效期内,它是你知识库中最重要的一份文档。而在有效期结束一小时后,它就是错误的。

  5. Versioning: a fact that has been replaced. This was my biggest problem. When I updated a document, both versions stayed in the vector store. The old one kept winning because it had more matching words. The fix here is neither removal nor boosting. Let time decay handle it. The newer document should naturally outscore the older one when recency is part of the ranking signal.

  6. 版本控制:事实已被替换。这是我遇到的最大问题。当我更新文档时,两个版本都留在了向量数据库中。旧版本总是胜出,因为它有更多的匹配词汇。这里的修复方法既不是移除也不是提升权重,而是让时间衰减(time decay)来处理。当“时效性”成为排名信号的一部分时,较新的文档自然应该比旧文档得分更高。

(Table omitted for brevity, but the logic follows the text above.) (表格为简洁起见略过,逻辑同上文所述。)