Can Editing 1 Neuron Fix Repetition Loops in LLMs?

修改 1 个神经元能修复大语言模型（LLM）的重复循环问题吗？

Abstract: Yes. Can it cure doom loops? Probably not. The Gemma 4 instruction-tuned models share a reproducible failure: on long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokemon, they collapse into repetition, either a tight verbatim loop or a list whose entries decay onto a single answer. These loops occur at rates as high as 95% and survive prompt rewording, inference-engine changes, and most sampling adjustments.

摘要： 可以。但它能治愈“死循环”吗？可能不行。Gemma 4 指令微调模型存在一个可复现的缺陷：在处理长篇事实枚举提示词时（例如列出某部电视剧的所有剧集、国际天文学联合会定义的 88 个星座，或最初的 151 只宝可梦），模型会陷入重复，表现为紧密的逐字循环，或者列表条目最终退化为同一个答案。这些循环的发生率高达 95%，且无法通过重写提示词、更换推理引擎或调整大多数采样参数来解决。

In this paper we explore whether this behavior is localized enough to remove by weight edits. To localize the cause, we use per-layer ablation and per-neuron attribution, then confirm the strongest candidates with full-generation sweeps. The loops trace to a small set of MLP neurons (or, in the 26B-A4B Mixture-of-Experts model, a few routed experts) which we suppress with static weight edits. These “surgeries” can be as small as a single sign-inverted neuron (in the E2B model).

在本文中，我们探讨了这种行为是否足够局部化，从而可以通过权重编辑来消除。为了定位原因，我们使用了逐层消融（per-layer ablation）和逐神经元归因（per-neuron attribution）技术，并通过全生成扫描确认了最可能的候选对象。研究发现，这些循环源于一小部分 MLP 神经元（在 26B-A4B 混合专家模型中，则源于少数被路由的专家），我们通过静态权重编辑抑制了它们。这些“手术”规模极小，有时仅需反转单个神经元的符号（在 E2B 模型中）。

The size of the effective edits grows with model scale, but in all cases, the loop patterns can be addressed at normal generation budgets while preserving general-purpose benchmark scores. However, the edits do not solve everything: we also study longer thinking budgets, where the two larger models most visibly enter doom looping, i.e. a non-convergent regime in which the model self-corrects in circles over a fact it cannot recall, exhausting the budget without committing to a final answer.

有效编辑的规模会随着模型规模的增大而增加，但在所有情况下，这些循环模式都可以在正常的生成预算内得到解决，同时保持通用基准测试的分数不变。然而，这些编辑并不能解决所有问题：我们还研究了更长的思考预算，在这种情况下，两个较大的模型最明显地进入了“死循环”，即一种非收敛状态——模型在无法回忆起某个事实时，会不断自我纠正并陷入循环，最终耗尽预算却无法给出最终答案。

We show this residual failure is reduced but not eliminated by the same edits, and argue it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop, but it cannot supply a missing fact. Our results are both a feasibility demonstration, that is, evidence that a concrete generation pathology can be localized to a few parameters and edited out, and a delineation of where that approach stops.

我们证明，同样的编辑可以减少但不能消除这种残留的故障，并指出这从根本上是一个知识精确度问题，而不是可以通过移除电路解决的问题；权重手术可以删除循环，但无法补充缺失的事实。我们的研究结果既是一项可行性演示（证明了具体的生成病理可以定位到少数参数并被编辑掉），也界定了该方法的局限性。