AURA: Action-Gated Memory for Robot Policies at Constant VRAM

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint.

摘要: KV-cache(键值缓存)是数据中心的理想内存方案,但对于机器人而言却并不适用。数据中心推理通常会批量处理许多短请求并进行重置,从而在大量任务间分摊注意力缓存的开销。相比之下,具身智能体在带宽受限的边缘硬件上运行的是单一、长周期且不重置的任务。在这种环境下,高带宽内存和闪存资源稀缺,闪存具有有限的写入寿命,且内存写入(而非计算)往往成为性能瓶颈。

AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps.

AURA-Mem(动作效用循环自适应内存)正是针对这一场景设计的。它在冻结的视觉-语言-动作主干模型外封装了一个固定大小的循环内存,并引入了一个学习型门控机制,仅在当前观测结果会改变下一个动作时才进行写入:这是一种“知道何时保持沉默”的内存。与基于重构的内存不同,该门控直接针对闭环动作误差信号进行训练。无论任务周期多长,其推理状态始终固定为 4,224 字节,而 KV-cache 在 100,000 步时会膨胀至其 6,061 倍大。

On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal.

在受控的合成基准测试中,AURA-Mem 在保持与最佳 O(1) 基准模型同等准确率的同时,写入次数减少了 5.19 到 6.13 倍;在更简单的配置下,写入次数甚至减少了多达 9.19 倍。预算匹配的随机和周期性调度方案无法达到这一增益,这证明了该优势源于“动作惊喜”(action-surprise)信号。

On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.

在 LIBERO-Long 数据集(每个机械臂 n=60 个片段)上,针对训练好的闭环 OpenVLA-OFT 7B 模型进行测试,结果显示该门控机制并未影响成功率:AURA-Mem 的表现与无门控的基础策略(0.233)持平,并略优于始终写入的 KV 方案(0.217),同时实现了 7.0 倍的写入减少和恒定的内存占用。我们还实例化了一个近似信息状态价值损失界限作为方法论演示;但在当前规模下,该界限仅具参考意义,而非严格保证。