PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

PersonaDrive:用于闭环驾驶仿真的类人风格检索增强型 VLA 智能体

Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single behavioral mode. Recent work introduces style variation through post-hoc labels on observational data or LLM-inferred reward weights, but these signals act as proxies for what a style should reward rather than demonstrations of humans explicitly asked to drive in that style.

摘要: 闭环驾驶仿真器通常在环境中填充行为高度一致的非自车(non-ego)交通智能体,这些智能体要么由基于规则的交通管理器生成,要么由针对单一行为模式训练的学习模型产生。近期的研究通过对观测数据进行事后标注或利用大语言模型(LLM)推断奖励权重来引入风格变化,但这些信号仅作为风格奖励的代理指标,而非人类在明确指令下驾驶的真实演示。

We introduce PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, in which participants drive CARLA leaderboard routes under aggressive, neutral, and conservative instructions on a driver-in-the-loop rig. The pipeline has three stages: (i) offline triplet mining over per-style human driving data using a combined image-text similarity score; (ii) training a lightweight retrieval head that fuses frozen visual features with a small control encoder over per-style databases; and (iii) fine-tuning a single VLA backbone to treat retrieved context points as in-context behavioral demonstrations during waypoint prediction.

我们引入了 PersonaDrive,这是一个通过检索风格化人类驾驶数据集中的演示来调节视觉-语言-动作(VLA)驾驶智能体的流水线。在该数据集中,参与者在驾驶员在环(driver-in-the-loop)平台上,根据激进、中性及保守的指令驾驶 CARLA 排行榜路线。该流水线包含三个阶段:(i)使用组合的图文相似度评分,对各风格的人类驾驶数据进行离线三元组挖掘;(ii)训练一个轻量级检索头,将冻结的视觉特征与小型控制编码器融合,应用于各风格数据库;(iii)微调单一的 VLA 主干网络,使其在航点预测过程中将检索到的上下文点视为上下文行为演示。

At inference, the same backbone is conditioned on any style by swapping which per-style database the retrieval head queries, so selecting a style requires no per-style retraining while enabling human-style, style-diverse non-ego agents for closed-loop simulation.

在推理阶段,通过切换检索头查询的风格数据库,同一个主干网络即可适应任何风格。因此,选择风格无需进行针对性的重新训练,从而实现了用于闭环仿真的、具备人类风格且风格多样的非自车智能体。

On Bench2Drive, PersonaDrive (no style) improves the driving score by 4.6% over SimLingo and 2.5% over HiP-AD, and under style conditioning attains the highest driving score in every style within a roughly 2% band (its weakest style surpassing the strongest baseline, DMW, by 5.4%), while average speed and acceleration rise by 18% and 25% from the conservative to the aggressive instruction.

在 Bench2Drive 测试中,PersonaDrive(无风格模式)的驾驶得分比 SimLingo 提高了 4.6%,比 HiP-AD 提高了 2.5%。在风格调节下,它在每种风格中均获得了最高驾驶得分,且波动范围仅在 2% 左右(其最弱风格的表现也比最强基线 DMW 高出 5.4%);同时,从保守指令到激进指令,平均速度和加速度分别提升了 18% 和 25%。