FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

FAIR-Calib:面向扩散大语言模型训练后量化的前沿感知不稳定性重加权校准


Abstract: Diffusion Large Language Models (dLLMs) refine tokens iteratively but commit them irreversibly, leading to a “stability lag” where early decisions remain fragile even after being written. We reveal that Post-Training Quantization (PTQ) error easily flips these borderline decisions at the write frontier, which are then permanently locked in and amplified.

摘要: 扩散大语言模型(dLLMs)通过迭代方式优化 Token,但其提交过程是不可逆的,这导致了一种“稳定性滞后”现象——即早期的决策即使在被写入后依然脆弱。我们发现,训练后量化(PTQ)产生的误差极易在写入前沿(write frontier)翻转这些临界决策,进而被永久锁定并放大。


To address this, we propose Frontier-Aware Instability-Reweighted Calibration (FAIR-Calib), a two-stage PTQ framework for dLLMs. Stage I probes a full-precision teacher to estimate a position prior that combines frontier hits and masked-stage reliability. Stage II performs off-policy, layer-wise calibration by minimizing a reweighted hidden-state MSE, effectively prioritizing the protection of fragile frontier states without requiring expensive end-to-end diffusion rollouts.

为了解决这一问题,我们提出了前沿感知不稳定性重加权校准(FAIR-Calib),这是一个针对 dLLMs 的两阶段 PTQ 框架。第一阶段通过探测全精度教师模型来估计位置先验,该先验结合了前沿命中率和掩码阶段的可靠性。第二阶段通过最小化重加权后的隐藏状态均方误差(MSE)来执行离策略、逐层的校准,从而在无需昂贵的端到端扩散展开(rollouts)的情况下,有效地优先保护脆弱的前沿状态。


We further theoretically justify our weighted objective as a surrogate for output KL divergence. Empirically, FAIR-Calib consistently outperforms state-of-the-art baselines on LLaDA and Dream (W4A4), significantly reducing frontier decision flips and suppressing post-commit mismatches across diverse benchmarks.

我们进一步从理论上证明了我们的加权目标函数可以作为输出 KL 散度的代理。实验结果表明,FAIR-Calib 在 LLaDA 和 Dream (W4A4) 模型上始终优于现有的最先进基准,在多个基准测试中显著减少了前沿决策翻转,并抑制了提交后的不匹配问题。