H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
H-Probes:从语言模型的潜在表示中提取层次结构
Abstract: Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking.
摘要: 表示和导航层次结构是推理的基本要素。大型语言模型在各种需要层次推理的任务中表现出了熟练度,但目前对于这些模型如何从几何角度表示此类思维所需的潜在结构,分析还非常有限。
To this end, we develop \textit{H-probes}, a collection of linear probes that extract hierarchical structure, specifically depth and pairwise distance, from latent representations.
为此,我们开发了 H-probes,这是一组线性探针,旨在从潜在表示中提取层次结构,特别是深度和成对距离。
In synthetic tree traversal tasks, the H-probes robustly find the subspaces containing hierarchical structure necessary to complete the tasks; furthermore, in comprehensive ablation experiments, we show that these hierarchy-containing subspaces are low-dimensional, causally important for high task performance, and generalize within- and out-of-domain.
在合成树遍历任务中,H-probes 能够稳健地找到完成任务所需的包含层次结构的子空间;此外,通过全面的消融实验,我们证明了这些包含层次结构的子空间是低维的,对实现高性能具有因果重要性,并且在域内和域外均具有泛化能力。
Furthermore, we find analogous, though weaker, hierarchical structure in real-world hierarchical contexts such as mathematical reasoning traces. These results demonstrate that models represent hierarchy not only at the level of syntax and concepts, but at deeper levels of abstraction — including the reasoning process itself.
此外,我们在数学推理轨迹等现实世界的层次化语境中,也发现了类似但较弱的层次结构。这些结果表明,模型不仅在语法和概念层面,而且在更深层的抽象层面(包括推理过程本身)都表示了层次结构。