Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

针对异构仿真环境中联邦强化学习的个性化观测归一化方法

Abstract: Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. 摘要： 联邦强化学习（FedRL）使多个智能体能够在不共享原始数据的情况下协同训练全局策略，这使其成为隐私敏感型应用的理想选择。

However, FedRL faces challenges in heterogeneous environments where differing state-transition dynamics lead to non-identical input distributions and imbalanced parameter updates during aggregation. 然而，FedRL 在异构环境中面临挑战，因为不同的状态转移动态会导致输入分布不一致，并在聚合过程中引发参数更新不平衡的问题。

Therefore, this paper develops a personalized observation normalization (PON) method, allowing each agent to locally normalize raw state inputs using a continuously updated running mean and variance. 因此，本文开发了一种个性化观测归一化（PON）方法，允许每个智能体使用持续更新的运行均值和方差，对其原始状态输入进行本地归一化处理。

This design ensures consistent scaling of local feature without overshadowing across agents during aggregation. 该设计确保了本地特征在聚合过程中能够保持一致的缩放比例，而不会被其他智能体的数据所掩盖。

Furthermore, we demonstrate that sharing normalization parameters across agents is ineffective due to the diverse local input distributions, which highlights the necessity of personalized statistics. 此外，我们证明了由于本地输入分布的多样性，在智能体之间共享归一化参数是无效的，这凸显了个性化统计数据的必要性。

Experiments on heterogeneous MuJoCo tasks show that our developed PON accelerates training and achieves superior performance compared to baseline methods. 在异构 MuJoCo 任务上的实验表明，与基准方法相比，我们开发的 PON 加速了训练过程并取得了更优越的性能。