Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

为什么大语言模型在处理结构化知识时会产生幻觉：对线性化表示推理的机制分析

Abstract: In many reasoning tasks, large language models (LLMs) rely on structured external knowledge, such as graphs and tables, which is typically linearized into sequential token representations. However, even when sufficient knowledge is available, LLMs can still produce hallucinated outputs, and the underlying mechanisms behind such failures remain poorly understood.

摘要： 在许多推理任务中，大语言模型（LLMs）依赖于图表和表格等结构化外部知识，这些知识通常被线性化为序列化的 Token 表示。然而，即使在拥有充足知识的情况下，大语言模型仍可能产生幻觉输出，而导致这些失败的底层机制目前尚不明确。

We investigate these mechanisms and find that hallucinations arise from systematic internal dynamics rather than random noise. First, attention disproportionately concentrates toward shortcut-like structural cues rather than distributing across the full context. Second, feed-forward representations fail to ground the provided knowledge, causing the model to revert to parametric memory.

我们对这些机制进行了研究，发现幻觉源于系统的内部动力学，而非随机噪声。首先，注意力机制不成比例地集中在类似“捷径”的结构性线索上，而不是分布在整个上下文中。其次，前馈网络（Feed-forward）的表示未能将所提供的知识进行语义锚定（Grounding），导致模型退回到参数化记忆中。

Moreover, our results indicate that hallucination is consistently associated with failures in semantic grounding within feed-forward layers, while attention allocation exhibits greater task-dependent variability. Finally, we show that these mechanistic patterns generalize beyond single-hop graphs to multi-hop and tabular settings, enabling effective hallucination detection across structured knowledge formats.

此外，我们的研究结果表明，幻觉始终与前馈层中语义锚定的失败有关，而注意力分配则表现出更大的任务依赖性差异。最后，我们证明了这些机制模式不仅适用于单跳图，还能推广到多跳图和表格场景，从而能够针对各种结构化知识格式实现有效的幻觉检测。