Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability
Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability
几何偏差作为一种无监督的生成前可靠性信号:探索大语言模型表征的可回答性
Abstract: A reliable language model should be able to signal, prior to generation, when a query falls outside its knowledge. We investigate whether representation geometry can provide such a pre-generation signal by measuring the deviation of hidden states from an answerable reference set, requiring no labeled failure data and no access to model outputs. 摘要: 一个可靠的语言模型应当能够在生成之前,就识别出查询内容是否超出了其知识范围。我们研究了表征几何(representation geometry)是否能通过测量隐藏状态与可回答参考集之间的偏差,来提供这种生成前的信号。该方法无需标注的失败数据,也无需访问模型的输出。
Across three instruction-tuned models (Llama 3.1-8B, Qwen 2.5-7B, and Mistral-7B-Instruct) and three prompt forms (Math, Fact, Code), we find that geometry primarily encodes task form. Within mathematical prompts, unanswerable inputs consistently deviate from the answerable centroid, yielding strong separation (ROC-AUC 0.78-0.84). This single-pass pre-generation signal outperforms a simple refusal baseline and compares favorably to self-consistency. It also captures cases where models do not explicitly refuse. 在三个经过指令微调的模型(Llama 3.1-8B、Qwen 2.5-7B 和 Mistral-7B-Instruct)以及三种提示形式(数学、事实、代码)中,我们发现几何结构主要编码了任务形式。在数学提示中,不可回答的输入始终偏离可回答的中心点,从而产生显著的区分度(ROC-AUC 0.78-0.84)。这种单次生成的预判信号优于简单的拒绝基准,且表现不亚于自洽性(self-consistency)方法。它还能捕捉到模型未明确拒绝的情况。
In contrast, no reliable geometric signal emerges for factual prompts, indicating that the effect is form-conditional rather than universal. Code prompts show large effect sizes with higher variance, suggesting partial generalization beyond mathematical form. A layer-wise analysis reveals that the signal arises in early layers and gradually attenuates toward the output. 相比之下,事实类提示中并未出现可靠的几何信号,这表明该效应是形式依赖的,而非普适的。代码类提示显示出较大的效应量,但方差较高,这暗示了其在数学形式之外存在部分泛化能力。逐层分析表明,该信号在早期层中产生,并向输出层逐渐减弱。
These results suggest that answerability-related geometry is established before the final stages of generation. Together, these findings indicate that geometric deviation can serve as a lightweight pre-generation signal that is reliable in structured domains with formal answerability constraints, with clear boundaries on where it generalizes. 这些结果表明,与可回答性相关的几何结构在生成的最后阶段之前就已经建立。综上所述,这些发现表明几何偏差可以作为一种轻量级的生成前信号,在具有形式化可回答约束的结构化领域中表现可靠,并明确了其泛化的边界。