Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

持续学习，持续混合：一种高效且简洁的通用数据混合方法

Abstract: Data mixing decides how to combine different sources or types of data and is a consequential problem throughout language model training. In pretraining, data composition is a key determinant of model quality; in continual learning and adaptation, it governs what is retained and acquired. Yet existing data mixing methods address only one phase of this lifecycle at a time: some require smaller proxy models tied to a single training phase, others assume a fixed domain set, and continual learning lacks principled guidance altogether.

摘要： 数据混合决定了如何组合不同来源或类型的数据，这是语言模型训练过程中一个至关重要的问题。在预训练阶段，数据构成是决定模型质量的关键因素；在持续学习和适应阶段，它则决定了模型能够保留和习得哪些知识。然而，现有的数据混合方法通常一次只能解决生命周期中的一个阶段：有些方法需要依赖绑定于特定训练阶段的小型代理模型，有些则假设领域集是固定的，而持续学习领域目前完全缺乏原则性的指导。

We argue that data mixing is fundamentally an online decision making problem — one that recurs throughout training and demands a single, unified solution. We introduce OP-Mix (On-Policy Mix), a data mixing algorithm that operates across the entire language model training lifecycle. Our main insight is that candidate data mixtures can be cheaply simulated by interpolating between low-rank adapters trained directly on the current model, eliminating separate proxy models and ensuring the search is always grounded in the model’s actual learning dynamics.

我们认为，数据混合从根本上是一个在线决策问题——它在整个训练过程中反复出现，因此需要一个单一、统一的解决方案。我们引入了 OP-Mix（On-Policy Mix），这是一种能够贯穿整个语言模型训练生命周期的数据混合算法。我们的核心洞察是：可以通过在当前模型上直接训练的低秩适配器（Low-rank adapters）进行插值，从而以极低的成本模拟候选数据混合方案。这种方法消除了对独立代理模型的需求，并确保了搜索过程始终基于模型实际的学习动态。

Across pretraining, continual midtraining, and continual instruction tuning, OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of the baselines. In pretraining, OP-Mix improves upon training without mixing by 6.3% in average perplexity. For continual learning, OP-Mix matches the performance of both retraining and on-policy distillation while using 66% and 95% less overall compute, respectively. OP-Mix suggests a different view of language model training: not a sequence of distinct phases, but a single continuous process of learning from data.

在预训练、持续中期训练和持续指令微调中，OP-Mix 始终能找到近乎最优的混合方案，且计算成本仅为基准方法的一小部分。在预训练中，OP-Mix 将平均困惑度（Perplexity）较无混合训练提升了 6.3%。在持续学习任务中，OP-Mix 在达到与重训练（Retraining）和策略内蒸馏（On-policy distillation）相当的性能的同时，总计算量分别减少了 66% 和 95%。OP-Mix 为语言模型训练提供了一种新的视角：它不再是一系列截然不同的阶段，而是一个从数据中学习的单一连续过程。