Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

跳过还是循环？大语言模型中的“层级程序”（Program-of-Layers）学习

Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained layers can be packed as modules and then skipped or looped to form a customized program for each input. 大语言模型（LLMs）在进行推理时，通常遵循固定的深度和顺序，对所有层进行非循环执行。我们揭示了一种无需训练、灵活且动态的“层级程序”（Program-of-Layers, PoLar）的广泛存在性。在这种机制下，预训练层可以被打包为模块，并根据每个输入的需求进行跳过或循环，从而形成定制化的执行程序。

For most inputs, substantially shorter program executions can achieve the same or better accuracy, while incorrect predictions of the original LLM can be corrected by alternative programs with fewer layers. These observations indicate that inference admits multiple valid latent computations beyond the standard forward pass. 对于大多数输入而言，显著缩短的程序执行路径即可达到相同甚至更好的准确率；同时，原始大语言模型产生的错误预测，可以通过层数更少的替代程序进行修正。这些观察结果表明，推理过程除了标准的正向传播外，还存在多种有效的潜在计算路径。

To efficiently achieve PoLar in practice, we propose a lightweight PoLar prediction network, which learns to generate execution programs that dynamically skip or repeat pretrained layers for each input. 为了在实践中高效实现 PoLar，我们提出了一个轻量级的 PoLar 预测网络。该网络能够学习为每个输入生成执行程序，从而动态地跳过或重复预训练层。

Experiments on mathematical reasoning benchmarks demonstrate that PoLar consistently improves accuracy over standard inference and prior dynamic-depth methods, often while executing fewer layers, and that these gains persist under out-of-distribution evaluation. Our results suggest that fixed-depth execution captures only a narrow subset of an LLM’s latent reasoning capacity. 在数学推理基准测试上的实验表明，与标准推理及先前的动态深度方法相比，PoLar 在执行更少层数的情况下，能够持续提升准确率，且这种增益在分布外（out-of-distribution）评估中依然有效。我们的研究结果表明，固定深度的执行方式仅捕捉到了大语言模型潜在推理能力的一小部分。