Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

面向自动驾驶强化学习的“不确定性感知与时间调节”专家建议框架

Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving. 摘要： 在自动驾驶的强化学习中，探索过程本质上是不安全的：智能体必须通过尝试新颖的行为来学习，但这种探索往往会导致碰撞或偏离道路。

We propose an uncertainty-aware framework that leverages expert advice to guide exploration while avoiding long-term dependence. 我们提出了一种不确定性感知框架，利用专家建议来引导探索，同时避免对专家策略的长期依赖。

Advice is triggered when epistemic or aleatoric uncertainty exceeds adaptive thresholds derived from rolling buffers, ensuring advice evolves with the agent’s confidence. 当认知不确定性（epistemic uncertainty）或偶然不确定性（aleatoric uncertainty）超过基于滚动缓冲区（rolling buffers）导出的自适应阈值时，系统会触发专家建议，从而确保建议能够随着智能体置信度的提升而动态演进。

A commitment-cooldown strategy with a stochastic early-stop heuristic regulates the duration and frequency of guidance, exposing the agent to coherent maneuvers without exhausting the advice budget. 通过一种结合随机提前停止启发式算法的“承诺-冷却”（commitment-cooldown）策略，我们调节了引导的持续时间和频率，使智能体能够在不耗尽专家建议预算的前提下，接触到连贯的驾驶操作。

Expert and agent experiences are combined in a shared replay buffer within an off-policy implicit quantile network (IQN) backbone, enabling efficient reuse of expert trajectories. 专家经验与智能体经验被整合在离线策略（off-policy）隐式分位数网络（IQN）主干的共享回放缓冲区中，实现了专家轨迹的高效复用。

Experiments in CARLA show that our method outperforms the IQN baseline, improving success by 5-7% and reducing failures, demonstrating that risk-sensitive uncertainty coupled with regulated expert integration enables safer and more efficient exploration for sensor-based RL policy learning in unsignalized intersection navigation. 在 CARLA 环境中的实验表明，我们的方法优于 IQN 基准模型，成功率提高了 5-7% 并降低了故障率。这证明了将风险敏感型不确定性与受控的专家集成相结合，能够为无信号交叉路口的传感器驱动型强化学习策略学习提供更安全、更高效的探索路径。