From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons
From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons
从自回归(AR)到扩散模型:利用严格因果与弹性视界高效适配大语言模型
Abstract: Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. 摘要: 扩散模型有望实现高效的并行文本生成,但其依赖于双向注意力机制,这与预训练的自回归(AR)模型在结构上存在不匹配。这种不兼容性导致无法复用强大的 AR 先验知识,从而迫使研究者不得不进行代价高昂的从头预训练。
To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm. By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. 为了弥补这一差距,我们提出了 FLUID,这是一个能够将 AR 主干网络高效适配到扩散范式的框架。通过强制执行“严格因果对齐”(Strictly Causal Alignment),FLUID 能够从标准的 GPT 风格检查点进行无缝初始化,从而避免了大规模预训练的需求。
Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules. 此外,我们引入了“弹性视界”(Elastic Horizons),这是一种基于熵驱动的机制,它能够根据局部信息密度而非固定的调度方案,动态调节去噪步长。
Experiments demonstrate that FLUID achieves state-of-the-art performance while reducing training costs by orders of magnitude, effectively reconciling established AR foundations with efficient parallel generation. Our code is available at this https URL. 实验表明,FLUID 在实现最先进性能的同时,将训练成本降低了几个数量级,有效地将成熟的 AR 基础与高效的并行生成结合在了一起。我们的代码已在链接中提供。
Paper Details:
- Authors: Xiangyu Ma, Teng Xiao, Zuchao Li, Lefei Zhang
- arXiv ID: 2605.27387
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Submission Date: 11 Apr 2026
论文详情:
- 作者: Xiangyu Ma, Teng Xiao, Zuchao Li, Lefei Zhang
- arXiv ID: 2605.27387
- 学科分类: 计算与语言 (cs.CL);人工智能 (cs.AI)
- 提交日期: 2026年4月11日