Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

发展性研究揭示神经语言模型的统计学习机制：Transformer 从最抽象的统计模式中进行泛化

Abstract: In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The model states are saved at multiple stages in the course of training.

摘要： 在本研究中，我们采用发展性研究方法，探讨了神经语言模型（NLM）的统计学习过程及其心理表征。我们在一套合成语法上训练了一系列生成式 Transformer 模型，并在训练过程的多个阶段保存了模型状态。

Through analyzing how the internal representations of these models change in the developmental path, we found that NLMs acquire the most abstract global statistical knowledge at the beginning of learning and later acquire the relatively local statistical dependencies.

通过分析这些模型在发展路径中内部表征的变化，我们发现神经语言模型在学习初期首先习得的是最抽象的全局统计知识，随后才习得相对局部的统计依赖关系。

This learning path contains many over-generalizations from the very beginning and these over-generalizations are gradually constrained in the later stage of learning. Based on this observation, we propose a new framework to explain the statistical learning and language cognition of NLMs.

这一学习路径从一开始就包含了许多过度泛化现象，而这些过度泛化在学习的后期阶段逐渐受到约束。基于这一观察，我们提出了一个新的框架，用以解释神经语言模型的统计学习过程与语言认知机制。