CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

CoreMem:用于对话智能体长期记忆的黎曼检索与费舍尔引导蒸馏技术

Abstract: Personalized dialogue agents require continuous long-term memory to maintain coherent interactions across multiple sessions. However, deploying these capabilities on consumer-grade hardware (e.g., 8 GB VRAM edge devices) introduces severe memory and compute bottlenecks. Existing systems typically rely on isotropic cosine similarity for retrieval and heuristic rules for context compression. These approaches lack a unified theoretical foundation, frequently suffering from the hubness problem in high-dimensional retrieval and syntactic fragmentation during compression.

摘要: 个性化对话智能体需要持续的长期记忆,以在多个会话中保持交互的一致性。然而,在消费级硬件(例如 8 GB 显存的边缘设备)上部署这些功能会带来严重的内存和计算瓶颈。现有的系统通常依赖各向同性的余弦相似度进行检索,并使用启发式规则进行上下文压缩。这些方法缺乏统一的理论基础,在高维检索中经常遭遇“中心点问题”(hubness problem),并在压缩过程中导致句法碎片化。

To overcome these limitations, we propose CoreMem, a resource-efficient edge-cloud memory architecture fundamentally unified by information geometry. First, Riemannian retrieval replaces cosine matching with a locally adaptive Fisher-Rao metric, effectively penalizing hub memories via Mahalanobis distance with O(Ndr) Woodbury acceleration for real-time search.

为了克服这些局限性,我们提出了 CoreMem,这是一种由信息几何从根本上统一的资源高效型边缘云内存架构。首先,黎曼检索(Riemannian retrieval)用局部自适应的 Fisher-Rao 度量取代了余弦匹配,通过马氏距离(Mahalanobis distance)有效惩罚中心点记忆,并结合 O(Ndr) 的 Woodbury 加速技术实现实时搜索。

Second, Fisher-guided discrete token distillation (FDTD) introduces a hierarchical sentence-to-token compression mechanism. It derives sensitivity scores from Fisher information traces, providing a principled compression-KL tradeoff augmented with explicit structural syntax protection.

其次,费舍尔引导离散标记蒸馏(FDTD)引入了一种分层句子到标记的压缩机制。它从费舍尔信息迹(Fisher information traces)中导出敏感度分数,在提供有原则的压缩-KL 权衡的同时,增加了显式的结构化句法保护。

Evaluated on the LOCOMO and LongMemEval-S benchmarks, CoreMem achieves strong accuracy improvements, yielding substantial gains in Open-domain (+4.51 pp) and Temporal (+4.17 pp) reasoning. Extensive profiling confirms that CoreMem operates seamlessly within a strict 8 GB VRAM budget, successfully bridging the gap between resource-constrained edge devices and the demand for theoretically grounded, lifelong memory agents.

在 LOCOMO 和 LongMemEval-S 基准测试中,CoreMem 实现了显著的准确率提升,在开放域推理(+4.51 pp)和时间推理(+4.17 pp)方面取得了实质性进展。广泛的性能分析证实,CoreMem 能够在严格的 8 GB 显存限制内无缝运行,成功弥合了资源受限的边缘设备与对具有理论基础的终身记忆智能体需求之间的鸿沟。