Δ-Mem: Efficient Online Memory for Large Language Models

Δ-Mem: Efficient Online Memory for Large Language Models

Δ-Mem:面向大语言模型的高效在线记忆机制

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. 在大语言模型(LLM)应用于长期助手和智能体系统时,积累并复用历史信息的需求日益增长。然而,单纯扩展上下文窗口不仅成本高昂,且往往难以确保对上下文的有效利用。

We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone’s attention computation during generation. 我们提出了 $\delta$-mem,这是一种轻量级的记忆机制,通过紧凑的联想记忆在线状态来增强冻结的全注意力主干模型。$\delta$-mem 将历史信息压缩为固定大小的状态矩阵,并通过 Delta 规则学习进行更新;在生成过程中,它利用该矩阵的读出结果对主干模型的注意力计算进行低秩修正。

With only an $8\times8$ online memory state, $\delta$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$\delta$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. 仅需 $8\times8$ 的在线记忆状态,$\delta$-mem 就能将平均得分提升至冻结主干模型的 1.10 倍,以及最强非 $\delta$-mem 记忆基准模型的 1.15 倍。在内存密集型基准测试中,它取得了更大的增益,在 MemoryAgentBench 上达到 1.31 倍,在 LoCoMo 上达到 1.20 倍,同时在很大程度上保留了模型的通用能力。

These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension. 这些结果表明,无需进行全参数微调、更换主干模型或显式扩展上下文,仅通过与注意力计算直接耦合的紧凑在线状态,即可实现有效的记忆功能。