Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

通过将公平性视为对称操作来检测和缓解偏差

Abstract: Machine learning systems deployed in high stakes socioeconomic settings routinely display bias. We formalize bias as a symmetry breaking operation: a classifier is fair if its outputs remain invariant under the counterfactual operation of switching a sensitive attribute, with merit features held fixed. 摘要： 在高风险社会经济环境中部署的机器学习系统经常表现出偏差。我们将偏差形式化为一种对称性破缺操作：如果一个分类器在保持能力特征不变的情况下，对敏感属性进行反事实切换时，其输出保持不变，那么该分类器就是公平的。

We implement loss based regularization as a symmetry restoring mechanism and evaluate the framework on four synthetic datasets with varying levels of noise, correlation, and bias. The framework achieves upwards of 90% violation reduction, with accuracy costs around 5%. 我们实施了基于损失的正则化作为对称性恢复机制，并在四个具有不同噪声、相关性和偏差水平的合成数据集上评估了该框架。该框架实现了超过 90% 的违规减少，而准确率损失仅在 5% 左右。

This framework does not require causal graph knowledge, is computationally lightweight, and generalizes to any sensitive attribute definable as a bit-flip, making it suitable for contexts where local sources of discrimination remain absent from mainstream benchmarks. 该框架不需要因果图知识，计算轻量化，并且可以推广到任何可定义为位翻转（bit-flip）的敏感属性，这使其适用于那些主流基准测试中缺乏局部歧视来源的场景。