FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts
FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts
FRAME:利用分数阶傅里叶专家混合模型学习适应域
Parameter-efficient fine-tuning (PEFT) reparameterizes weight updates in a fixed basis: low-rank adapters operate in the spatial domain, while a recent line of spectral methods operates in a fixed Fourier domain. We argue that the choice of domain is itself a design degree of freedom that should be learned, and that no single basis is optimal across tasks, layers, or tokens.
参数高效微调(PEFT)在固定基底中对权重更新进行重参数化:低秩适配器(low-rank adapters)在空间域中运行,而近期的一系列谱方法则在固定的傅里叶域中运行。我们认为,域的选择本身就是一个应该被学习的设计自由度,且没有任何单一的基底在所有任务、层或标记(token)上都是最优的。
We introduce Fractional-Fourier Mixture of Experts, a mixture-of-experts adapter in which every expert carries a learnable fractional-Fourier order that continuously interpolates between the spatial domain (recovering vanilla LoRA) and the Fourier domain (recovering a spectral adapter). Routing tokens through experts that occupy different points on this spatial-spectral continuum lets the model place each low-rank update in the domain where it is most compact, and — because fractional-Fourier operators of different orders are mutually incoherent — makes the experts naturally decorrelated, which reduces interference and improves multi-task composition.
我们引入了分数阶傅里叶专家混合模型(Fractional-Fourier Mixture of Experts),这是一种专家混合适配器,其中每个专家都携带一个可学习的分数阶傅里叶阶数,该阶数可以在空间域(还原为原始 LoRA)和傅里叶域(还原为谱适配器)之间进行连续插值。通过将标记路由至占据该空间-谱连续体上不同点的专家,模型能够将每个低秩更新放置在最紧凑的域中;此外,由于不同阶数的分数阶傅里叶算子是互不相关的,这使得专家们自然地去相关化,从而减少了干扰并改善了多任务组合效果。
The order is a single scalar per expert, trained with a separate optimizer, and the transform is computed with an $\mathcal{O}(d\log d)$ chirp—FFT surrogate, so Fractional-Fourier Mixture of Experts adds negligible cost over standard MoE-LoRA.
每个专家仅包含一个标量阶数,通过独立的优化器进行训练,且变换过程使用 $\mathcal{O}(d\log d)$ 的 Chirp-FFT 代理计算,因此分数阶傅里叶专家混合模型相比标准 MoE-LoRA 几乎没有增加额外成本。
Across commonsense, mathematical, code, and knowledge benchmarks on LLaMA-3.1-8B and Qwen2.5-7B, Fractional-Fourier Mixture of Experts improves over strong MoE-LoRA and spectral baselines — including FlyLoRA, FourierMoE, and HMoRA — while keeping the active-parameter budget small, and analysis shows that the learned orders specialize by task and layer in interpretable ways.
在 LLaMA-3.1-8B 和 Qwen2.5-7B 模型上的常识、数学、代码和知识基准测试中,分数阶傅里叶专家混合模型超越了强大的 MoE-LoRA 和谱基准方法(包括 FlyLoRA、FourierMoE 和 HMoRA),同时保持了较小的活跃参数预算。分析表明,所学习到的阶数能够根据任务和层级以可解释的方式进行专门化。