Simplifying the Modeling of Arbitrary Conditionals in Natural Language
Simplifying the Modeling of Arbitrary Conditionals in Natural Language
简化自然语言中任意条件建模的方法
Abstract: Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals — e.g., a block of text conditioned on past and future tokens. 摘要: 因果 Transformer(Causal Transformers)通过联合分布的自回归分解来建模序列,这使得高效的从左到右解码和条件似然计算成为可能。然而,它们无法有效地对任意条件进行采样或评估——例如,基于过去和未来标记(tokens)的文本块。
Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals — including past, future, and mixed contexts — within a single forward pass. 近期的研究旨在通过新颖的架构来解决这一问题,但这些方法往往导致对这类条件的建模效果不佳,并降低了生成质量。我们提出了“任意条件 GPT”(AC-GPT),它对标准因果 Transformer 进行了简单的修改,从而能够在单次前向传播中评估和采样任意条件,包括过去、未来以及混合上下文。
Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. 与以往的方法不同,我们的方法保留了标准的从左到右顺序和下一个标记预测目标,这对于实现强大的性能和高效的自然语言训练至关重要。至关重要的是,这种兼容性使得现有的语言模型(LLMs)能够通过微调来实现任意条件化。
Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance. 我们的实证结果表明,该方法在建模任意条件方面优于基准模型,且不会降低标准的从左到右生成性能。