Simplifying the Modeling of Arbitrary Conditionals in Natural Language

简化自然语言中任意条件建模的方法

Abstract: Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals — e.g., a block of text conditioned on past and future tokens. 摘要： 因果 Transformer（Causal Transformers）通过联合分布的自回归分解来建模序列，这使得高效的从左到右解码和条件似然计算成为可能。然而，它们无法有效地对任意条件进行采样或评估——例如，基于过去和未来标记（tokens）的文本块。

Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals — including past, future, and mixed contexts — within a single forward pass. 近期的研究旨在通过新颖的架构来解决这一问题，但这些方法往往导致对这类条件的建模效果不佳，并降低了生成质量。我们提出了“任意条件 GPT”（AC-GPT），它对标准因果 Transformer 进行了简单的修改，从而能够在单次前向传播中评估和采样任意条件，包括过去、未来以及混合上下文。

Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. 与以往的方法不同，我们的方法保留了标准的从左到右顺序和下一个标记预测目标，这对于实现强大的性能和高效的自然语言训练至关重要。至关重要的是，这种兼容性使得现有的语言模型（LLMs）能够通过微调来实现任意条件化。

Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance. 我们的实证结果表明，该方法在建模任意条件方面优于基准模型，且不会降低标准的从左到右生成性能。