AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint.

摘要： KV-cache（键值缓存）是数据中心的理想内存方案，但对于机器人而言却并不适用。数据中心推理通常会批量处理许多短请求并进行重置，从而在大量任务间分摊注意力缓存的开销。相比之下，具身智能体在带宽受限的边缘硬件上运行的是单一、长周期且不重置的任务。在这种环境下，高带宽内存和闪存资源稀缺，闪存具有有限的写入寿命，且内存写入（而非计算）往往成为性能瓶颈。

AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps.

AURA-Mem（动作效用循环自适应内存）正是针对这一场景设计的。它在冻结的视觉-语言-动作主干模型外封装了一个固定大小的循环内存，并引入了一个学习型门控机制，仅在当前观测结果会改变下一个动作时才进行写入：这是一种“知道何时保持沉默”的内存。与基于重构的内存不同，该门控直接针对闭环动作误差信号进行训练。无论任务周期多长，其推理状态始终固定为 4,224 字节，而 KV-cache 在 100,000 步时会膨胀至其 6,061 倍大。

On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal.

在受控的合成基准测试中，AURA-Mem 在保持与最佳 O(1) 基准模型同等准确率的同时，写入次数减少了 5.19 到 6.13 倍；在更简单的配置下，写入次数甚至减少了多达 9.19 倍。预算匹配的随机和周期性调度方案无法达到这一增益，这证明了该优势源于“动作惊喜”（action-surprise）信号。

On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.

在 LIBERO-Long 数据集（每个机械臂 n=60 个片段）上，针对训练好的闭环 OpenVLA-OFT 7B 模型进行测试，结果显示该门控机制并未影响成功率：AURA-Mem 的表现与无门控的基础策略（0.233）持平，并略优于始终写入的 KV 方案（0.217），同时实现了 7.0 倍的写入减少和恒定的内存占用。我们还实例化了一个近似信息状态价值损失界限作为方法论演示；但在当前规模下，该界限仅具参考意义，而非严格保证。