PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

PersonaDrive：用于闭环驾驶仿真的类人风格检索增强型 VLA 智能体

Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single behavioral mode. Recent work introduces style variation through post-hoc labels on observational data or LLM-inferred reward weights, but these signals act as proxies for what a style should reward rather than demonstrations of humans explicitly asked to drive in that style.

摘要： 闭环驾驶仿真器通常在环境中填充行为高度一致的非自车（non-ego）交通智能体，这些智能体要么由基于规则的交通管理器生成，要么由针对单一行为模式训练的学习模型产生。近期的研究通过对观测数据进行事后标注或利用大语言模型（LLM）推断奖励权重来引入风格变化，但这些信号仅作为风格奖励的代理指标，而非人类在明确指令下驾驶的真实演示。

We introduce PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, in which participants drive CARLA leaderboard routes under aggressive, neutral, and conservative instructions on a driver-in-the-loop rig. The pipeline has three stages: (i) offline triplet mining over per-style human driving data using a combined image-text similarity score; (ii) training a lightweight retrieval head that fuses frozen visual features with a small control encoder over per-style databases; and (iii) fine-tuning a single VLA backbone to treat retrieved context points as in-context behavioral demonstrations during waypoint prediction.

我们引入了 PersonaDrive，这是一个通过检索风格化人类驾驶数据集中的演示来调节视觉-语言-动作（VLA）驾驶智能体的流水线。在该数据集中，参与者在驾驶员在环（driver-in-the-loop）平台上，根据激进、中性及保守的指令驾驶 CARLA 排行榜路线。该流水线包含三个阶段：（i）使用组合的图文相似度评分，对各风格的人类驾驶数据进行离线三元组挖掘；（ii）训练一个轻量级检索头，将冻结的视觉特征与小型控制编码器融合，应用于各风格数据库；（iii）微调单一的 VLA 主干网络，使其在航点预测过程中将检索到的上下文点视为上下文行为演示。

At inference, the same backbone is conditioned on any style by swapping which per-style database the retrieval head queries, so selecting a style requires no per-style retraining while enabling human-style, style-diverse non-ego agents for closed-loop simulation.

在推理阶段，通过切换检索头查询的风格数据库，同一个主干网络即可适应任何风格。因此，选择风格无需进行针对性的重新训练，从而实现了用于闭环仿真的、具备人类风格且风格多样的非自车智能体。

On Bench2Drive, PersonaDrive (no style) improves the driving score by 4.6% over SimLingo and 2.5% over HiP-AD, and under style conditioning attains the highest driving score in every style within a roughly 2% band (its weakest style surpassing the strongest baseline, DMW, by 5.4%), while average speed and acceleration rise by 18% and 25% from the conservative to the aggressive instruction.

在 Bench2Drive 测试中，PersonaDrive（无风格模式）的驾驶得分比 SimLingo 提高了 4.6%，比 HiP-AD 提高了 2.5%。在风格调节下，它在每种风格中均获得了最高驾驶得分，且波动范围仅在 2% 左右（其最弱风格的表现也比最强基线 DMW 高出 5.4%）；同时，从保守指令到激进指令，平均速度和加速度分别提升了 18% 和 25%。