Orthogonal Concept Erasure for Diffusion Models
Orthogonal Concept Erasure for Diffusion Models
扩散模型的正交概念擦除
Abstract: Concept erasure has emerged as a promising approach to mitigate undesired or unsafe content in diffusion models, yet existing methods still face significant limitations. While training-based methods are effective, their high computational cost limits scalability. Editing-based methods are more efficient and deployment-friendly, yet they struggle to simultaneously achieve precise concept erasure and preserve overall generative capacity.
摘要: 概念擦除已成为减轻扩散模型中不良或不安全内容的一种有前景的方法,但现有方法仍面临显著局限。虽然基于训练的方法效果显著,但其高昂的计算成本限制了可扩展性。基于编辑的方法虽然更高效且易于部署,但在实现精确概念擦除的同时,难以兼顾保持整体生成能力。
We identify this core limitation of the editing-based methods as reliance on additive parameter updates. Our empirical analysis reveals that concept semantics primarily depend on neuron direction rather than neuron magnitude, while overall generative capacity relies on the angular geometry of neurons. As additive updates inherently entangle direction, magnitude, and angular geometry, they inevitably introduce unintended interference between concept erasure and overall generation performance.
我们发现,基于编辑的方法的核心局限在于对加法参数更新的依赖。我们的实证分析表明,概念语义主要取决于神经元方向而非神经元幅度,而整体生成能力则依赖于神经元的角度几何结构。由于加法更新本质上会将方向、幅度和角度几何纠缠在一起,它们不可避免地会在概念擦除和整体生成性能之间引入意外的干扰。
To address this, we propose Orthogonal Concept Erasure (OCE), which reformulates editing-based erasure as multiplicative parameter updates from a geometric perspective. Specifically, OCE applies layer-wise orthogonal transformations derived from a closed-form solution to the parameters, enabling precise concept erasure while preserving the neuron magnitude and angular geometry.
为了解决这一问题,我们提出了正交概念擦除(Orthogonal Concept Erasure, OCE),从几何角度将基于编辑的擦除重新表述为乘法参数更新。具体而言,OCE 对参数应用了源自闭式解的逐层正交变换,从而在保持神经元幅度和角度几何结构的同时,实现了精确的概念擦除。
Furthermore, to address conflicting constraints in multi-concept erasure, OCE introduces a subspace-level objective with structured subspace manipulation, yielding a more effective and scalable erasure. Extensive experiments on single- and multi-concept erasure demonstrate that OCE outperforms existing methods in concept erasure and non-target preservation, erasing up to 100 concepts in 4.3 s.
此外,为了解决多概念擦除中的冲突约束,OCE 引入了一种具有结构化子空间操作的子空间级目标函数,从而实现了更有效且可扩展的擦除。在单概念和多概念擦除上的大量实验表明,OCE 在概念擦除和非目标保留方面优于现有方法,可在 4.3 秒内擦除多达 100 个概念。