RULER: Representation-Level Verification of Machine Unlearning

RULER：机器遗忘的表征级验证

Abstract: Machine unlearning aims to remove the influence of specific training records from a deployed model without retraining from scratch. Current protocols verify this at the output level through membership inference, retain accuracy, and forget-set accuracy, but a model can satisfy all three whilst still encoding forgotten records in its intermediate representations.

摘要： 机器遗忘旨在从已部署的模型中移除特定训练记录的影响，而无需从头开始重新训练。目前的协议通过成员推理（membership inference）、保留集准确率（retain accuracy）和遗忘集准确率（forget-set accuracy）在输出层面对此进行验证，但模型即便满足这三项指标，仍可能在其中间表征中编码已遗忘的记录。

We introduce RULER, a set of representation-level verification metrics. The oracle-comparative metric M2 measures whether forget-set records occupy the same representational position as in a model retrained without them. The oracle-free metric M4 detects residuals from the unlearned model’s internal similarity structure alone, without retraining.

我们引入了 RULER，这是一套表征级验证指标。预言机比较指标 M2 用于衡量遗忘集记录是否与在没有这些记录的情况下重新训练的模型占据相同的表征位置。无需预言机的指标 M4 则仅通过已遗忘模型的内部相似性结构来检测残留信息，无需进行重新训练。

Four approximate unlearning methods all pass output-level evaluation, yet under a linear mixed-effects model M2 detects significant residuals in 10 of 12 conditions (p<0.05), with effect sizes growing as the forget fraction increases. A fifth method, Bad Teacher, shows the same residuals despite a different forgetting mechanism.

四种近似遗忘方法均通过了输出层面的评估，但在线性混合效应模型下，M2 在 12 种条件中的 10 种里检测到了显著的残留（p<0.05），且效应量随遗忘比例的增加而增大。第五种方法“Bad Teacher”尽管采用了不同的遗忘机制，但也表现出了同样的残留。

M4 acts as a pre-unlearning diagnostic across tabular, image, clinical text, and face-identity settings: it detects identity-level memorisation in face recognition models where no tested method fully erases the signal.

M4 可作为一种遗忘前诊断工具，适用于表格、图像、临床文本和人脸识别等场景：它能够检测到人脸识别模型中身份层面的记忆，而在这些场景中，没有任何一种被测试的方法能完全抹除相关信号。