PREPING: Building Agent Memory without Tasks

Abstract: Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available.

摘要： 智能体（Agent）的记忆通常要么通过离线策划的演示构建，要么通过部署后的在线交互构建。然而，无论采用哪种方式，当智能体首次进入一个没有任何特定任务经验的新环境时，都会面临“冷启动”缺口。

In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories.

在本文中，我们研究了“任务前记忆构建”：智能体是否可以在观察任何目标环境任务之前，仅利用自生成的合成练习来构建程序性记忆。然而，仅靠合成交互是不够的，因为如果不控制练习内容和存储内容，合成任务会变得冗余、不可行，最终失去信息价值，且由于未经过滤的轨迹，记忆会迅速退化。

To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals.

为了克服这一问题，我们提出了 Preping，这是一个由“提议者（Proposer）”引导的记忆构建框架。其核心是“提议者记忆”，这是一种塑造未来练习的结构化控制状态。提议者根据此状态生成合成任务，求解器（Solver）执行这些任务，验证器（Validator）则确定哪些轨迹有资格插入记忆，同时提供反馈以指导未来的提议。

Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost $2.99\times$ lower on AppWorld and $2.23\times$ lower on BFCL v3 than online memory construction.

在 AppWorld、BFCL v3 和 MCP-Universe 上的实验表明，Preping 相比无记忆基线有显著提升，并达到了与基于离线或在线经验构建的强力手册（Playbook）方法相竞争的性能。同时，其部署成本在 AppWorld 上比在线记忆构建低 2.99 倍，在 BFCL v3 上低 2.23 倍。

Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.

进一步分析显示，其主要优势并非仅仅来自合成任务的数量，而是来自提议者对可行性、冗余度和覆盖范围的控制，以及选择性的记忆更新机制。