Position: The Term "Machine Unlearning" Is Overused in LLMs

Position: The Term “Machine Unlearning” Is Overused in LLMs

观点:大语言模型中“机器遗忘”一词被过度使用

Abstract: Large language models increasingly face demands to “forget” training data, knowledge, or behaviors due to regulatory deletion obligations, copyright/licensing disputes, and safety or product-policy requirements.

摘要: 由于监管机构的删除义务、版权/许可纠纷以及安全或产品政策要求,大语言模型(LLM)日益面临“遗忘”训练数据、知识或行为的需求。

This position paper argues that machine unlearning is overused as a term in LLM research and should be reserved for dataset-defined deletion: removing the training influence of a precisely specified forget set such that the resulting model is approximately indistinguishable from retraining without that data.

本立场论文认为,“机器遗忘”(Machine Unlearning)一词在 LLM 研究中被过度使用,应仅保留用于“数据集定义的删除”:即移除特定遗忘集(forget set)的训练影响,使得最终模型与在不使用该数据的情况下重新训练的模型几乎无法区分。

We contend that many tasks currently labeled “unlearning” (e.g., refusal for harmful requests, entity/knowledge removal, or targeted suppression) pursue different, often policy-dependent objectives and therefore require different terminology and baselines (e.g., alignment, suppression, editing, obfuscation).

我们主张,目前许多被标记为“遗忘”的任务(例如:拒绝有害请求、实体/知识移除或定向抑制)追求的是不同的、通常依赖于政策的目标,因此需要不同的术语和基准(例如:对齐、抑制、编辑、混淆)。

We further argue that this confusion is not cosmetic: because papers make different implicit guarantees under the same label, metrics and benchmarks are frequently reused outside their intended scope, rewarding surface-level non-disclosure (e.g., low ROUGE/forget accuracy) even when retraining-equivalence is not tested and derived capabilities remain.

我们进一步指出,这种混淆并非表面问题:由于不同论文在同一标签下做出了不同的隐含保证,指标和基准经常在其预期范围之外被重复使用,即使在未测试“重训练等效性”且衍生能力依然存在的情况下,也往往会奖励表层的“不披露”(例如较低的 ROUGE 分数或遗忘准确率)。

We conclude by calling for stricter terminology tied to explicit guarantees and reference models, and for evaluations that match the claimed objective.

最后,我们呼吁建立与明确保证和参考模型挂钩的更严格术语体系,并进行与所宣称目标相匹配的评估。