Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

Neuro-Symbolic Drive：用于自动驾驶视觉语言动作模型（VLA）的规则驱动忠实推理

Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representations and expose intermediate decisions in natural language, yet current rationales often lack the step-by-step decision semantics needed to keep the rationale causally connected to the planned motion.

摘要： 结合思维链（CoT）推理的自动驾驶视觉语言动作模型（VLA）因其能够利用预训练的视觉语言模型（VLM）表征并以自然语言展示中间决策过程而备受关注。然而，当前的推理过程往往缺乏必要的逐步决策语义，难以确保推理与规划的运动轨迹之间保持因果联系。

We introduce Neuro-Symbolic Drive, a neuro-symbolic driving framework that supervises a driving VLA with rule-grounded reasoning traces extracted directly from classical rule-based planners. Our key observation is that rule-based planners are symbolic AI systems that already function as executable reasoning engines: they reason about active safety constraints, search over candidate maneuvers, and select a final trajectory.

我们引入了 Neuro-Symbolic Drive，这是一个神经符号自动驾驶框架。该框架通过直接从经典规则驱动规划器中提取的“规则驱动推理轨迹”来监督自动驾驶 VLA。我们的核心发现是，基于规则的规划器本质上是符号人工智能系统，它们本身就是可执行的推理引擎：它们能够对主动安全约束进行推理、搜索候选机动方案，并最终选择最优轨迹。

We instrument these planners in simulation to capture both the executed trajectory and the internal decision trace at each rule-evaluation step. Each trace is serialized into structured rule-grounded reasoning and paired with the trajectory to fine-tune Qwen3.5-4B as a driving VLA. Because these traces are derived directly from the planner states that determine the action, they ensure reasoning is structurally coupled to motion generation by construction, rather than by post-hoc alignment.

我们在仿真环境中对这些规划器进行监测，以捕获执行轨迹以及每个规则评估步骤中的内部决策轨迹。每一条轨迹都被序列化为结构化的“规则驱动推理”，并与轨迹配对，用于微调 Qwen3.5-4B 模型作为自动驾驶 VLA。由于这些轨迹直接源自决定动作的规划器状态，它们从构建之初就确保了推理与运动生成在结构上的耦合，而非通过事后对齐实现。

On our simulator-generated benchmark, detailed rule-grounded reasoning reduces ADE@3s from 0.47 to 0.26 and miss rate from 8.30% to 6.40% under three-camera perception, and from 0.54 to 0.26 and 10.13% to 5.99% under eight-camera perception. Neuro-Symbolic Drive thus converts neuro-symbolic planning logic into structured supervision.

在我们的仿真基准测试中，详细的规则驱动推理使三目视觉感知下的 ADE@3s（3秒平均位移误差）从 0.47 降低至 0.26，漏检率从 8.30% 降低至 6.40%；在八目视觉感知下，ADE@3s 从 0.54 降低至 0.26，漏检率从 10.13% 降低至 5.99%。因此，Neuro-Symbolic Drive 成功地将神经符号规划逻辑转化为结构化的监督信号。