When transformers learn "impossible" languages, what do they learn?

When transformers learn “impossible” languages, what do they learn?

当 Transformer 学习“不可能”的语言时，它们学到了什么？

Abstract: Recent work suggests that transformer language models show a bias towards human languages over unnatural (“impossible”) languages argued to be unacquirable by humans. However, this literature has largely based these claims on differences in sample efficiency and test-set perplexity, rather than on direct evaluations of the linguistic capacities that could plausibly explain non-attestation in human languages.

摘要： 近期研究表明，Transformer 语言模型对人类语言表现出一种偏好，相比之下，它们对那些被认为人类无法习得的非自然（“不可能”）语言则表现出排斥。然而，现有文献得出这些结论主要基于样本效率和测试集困惑度（perplexity）的差异，而非直接评估那些能够合理解释为何人类语言中不存在此类现象的语言能力。

We evaluate two theoretically motivated linking hypotheses: impossibility arising from deficiencies in grammatical sensitivity or generative production. Using GPT-2 style models trained on perturbed “impossible” variants of English, we measure sensitivity to grammaticality using BLiMP minimal pairs, finding that model performance exhibits only gradual degradation, mediated by the language’s information locality.

我们评估了两个基于理论的关联假设：即“不可能”语言的产生是源于语法敏感性的缺陷，还是生成能力的不足。通过使用在经过扰动的“不可能”英语变体上训练的 GPT-2 风格模型，我们利用 BLiMP 最小对（minimal pairs）测量了模型对语法性的敏感度，发现模型性能仅表现出渐进式的退化，且这种退化受语言信息局部性的调节。

In contrast, these models exhibited pronounced failures in generation, producing substantially fewer high-quality sentences at longer lengths. Together, these results suggest generative deficiency and transmission failures as a plausible linking hypothesis between language model behaviour and non-attestation of impossible languages.

相比之下，这些模型在生成任务中表现出明显的失败，在较长句子的生成中，高质量句子的产出显著减少。综上所述，这些结果表明，生成能力缺陷和传递失败是解释语言模型行为与“不可能”语言在人类语言中缺失之间关联的一种合理假设。