Δ-Mem: Efficient Online Memory for Large Language Models

Δ-Mem：面向大语言模型的高效在线记忆机制

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. 在大语言模型（LLM）应用于长期助手和智能体系统时，积累并复用历史信息的需求日益增长。然而，单纯扩展上下文窗口不仅成本高昂，且往往难以确保对上下文的有效利用。

We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone’s attention computation during generation. 我们提出了 $\delta$-mem，这是一种轻量级的记忆机制，通过紧凑的联想记忆在线状态来增强冻结的全注意力主干模型。$\delta$-mem 将历史信息压缩为固定大小的状态矩阵，并通过 Delta 规则学习进行更新；在生成过程中，它利用该矩阵的读出结果对主干模型的注意力计算进行低秩修正。

With only an $8\times8$ online memory state, $\delta$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$\delta$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. 仅需 $8\times8$ 的在线记忆状态，$\delta$-mem 就能将平均得分提升至冻结主干模型的 1.10 倍，以及最强非 $\delta$-mem 记忆基准模型的 1.15 倍。在内存密集型基准测试中，它取得了更大的增益，在 MemoryAgentBench 上达到 1.31 倍，在 LoCoMo 上达到 1.20 倍，同时在很大程度上保留了模型的通用能力。

These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension. 这些结果表明，无需进行全参数微调、更换主干模型或显式扩展上下文，仅通过与注意力计算直接耦合的紧凑在线状态，即可实现有效的记忆功能。