I built a vector embedding cache that makes stale hits structurally impossible
I built a vector embedding cache that makes stale hits structurally impossible
我构建了一个向量嵌入缓存,从结构上杜绝了陈旧命中(stale hits)的可能性
Wrote up the design behind embcache, a GPU-native two-tier cache for embeddings and KV states. The problem it solves: embedding caches that key on content hash alone silently return stale vectors after a model upgrade or tokenizer change. The cache looks healthy. The vectors are wrong. 我撰写了关于 embcache 的设计思路,这是一个用于嵌入(embeddings)和 KV 状态的 GPU 原生二级缓存。它解决的问题是:仅依赖内容哈希(content hash)作为键的嵌入缓存,在模型升级或分词器(tokenizer)变更后,会静默地返回陈旧的向量。缓存看起来运行正常,但向量数据却是错误的。
The fix is a composite EmbeddingFingerprint covering model_id, tokenizer hash, chunking strategy, normalization version, prompt template, and dataset version. No partial matches, so no path to a stale hit from a pipeline change. 解决方案是一个复合的“嵌入指纹”(EmbeddingFingerprint),涵盖了模型 ID、分词器哈希、分块策略、归一化版本、提示词模板和数据集版本。由于不存在部分匹配,因此从流水线变更导致陈旧命中的路径被彻底切断。
Full writeup with benchmarks (98.3% hit rate, 400-450x speedup on KV cache hits) on Medium: https://bh3r1th.medium.com/the-vector-embedding-cache-bug-that-costs-nothing-and-corrupts-everything-157be6c575e8 包含基准测试(98.3% 命中率,KV 缓存命中速度提升 400-450 倍)的完整文章已发布在 Medium 上:https://bh3r1th.medium.com/the-vector-embedding-cache-bug-that-costs-nothing-and-corrupts-everything-157be6c575e8
Repo: https://github.com/bh3r1th/embcache Not on PyPI yet. Looking for feedback, especially on whether the fingerprint schema covers all the axes that could cause a stale hit in your pipeline. 代码仓库:https://github.com/bh3r1th/embcache 尚未发布至 PyPI。欢迎提供反馈,特别是关于该指纹架构是否涵盖了您流水线中可能导致陈旧命中的所有维度。