Large Language Models Explore by Latent Distilling

大语言模型通过潜在蒸馏进行探索

Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. 生成多样化的回复对于大语言模型（LLM）的测试时扩展（test-time scaling）至关重要，然而标准的随机采样通常只能产生表层的词汇变化，从而限制了语义层面的探索。

In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. 在本文中，我们提出了探索性采样（Exploratory Sampling, ESamp），这是一种在生成过程中显式鼓励语义多样性的解码方法。ESamp 的灵感源于一个广为人知的观察结果：神经网络在处理与之前遇到过的相似输入时，往往能做出误差较低的预测，而在处理新颖输入时则会产生较高的预测误差。

Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM’s depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. 基于这一特性，我们在测试时训练了一个轻量级的“蒸馏器”（Distiller），通过 LLM 的浅层表示来预测其深层隐藏表示，从而对 LLM 的深度表征转换进行建模。在解码过程中，该蒸馏器会持续适应当前生成上下文所诱导的映射关系。

ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training—inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). ESamp 利用预测误差作为新颖性信号，根据当前前缀对候选 Token 扩展进行重新加权，从而引导解码过程偏向于那些尚未被充分探索的语义模式。ESamp 采用异步训练-推理流水线实现，最坏情况下的开销不到 5%（在优化版本中仅为 1.2%）。

Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. 实证结果表明，ESamp 显著提升了推理模型的 Pass@k 效率，表现优于或媲美强大的随机和启发式基准方法。值得注意的是，ESamp 在数学、科学和代码生成基准测试中实现了稳健的泛化，并打破了创意写作中多样性与连贯性之间的权衡困境。