LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

LLM Doesn’t Know What It Doesn’t Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

大语言模型不知其所不知:通过临床表格数据上的跨模型归因差异检测认知盲点

Abstract: Large language models (LLMs) are increasingly applied to structured clinical data, yet whether they can recognize the limits of their own knowledge on such tasks remains unexplored. We study this question through the lens of cross-model attribution divergence with the goal of reducing epistemic uncertainty for structured tasks, comparing Qwen 2.5 7B and XGBoost on a prediction task via attribution divergence analysis.

摘要: 大语言模型(LLM)正越来越多地应用于结构化临床数据,但它们是否能在这些任务中识别自身知识的局限性,目前尚无定论。我们通过跨模型归因差异(cross-model attribution divergence)的视角研究了这一问题,旨在降低结构化任务中的认知不确定性,并通过归因差异分析比较了 Qwen 2.5 7B 和 XGBoost 在预测任务上的表现。

We report four findings. First, LLM verbalized confidence is epistemically vacuous, it outputs a near-constant (0.856-0.937) regardless of whether accuracy is 49% or 75.3%, tracking prompt format rather than prediction quality.

我们报告了四项发现。首先,LLM 的口头置信度在认知上是空洞的;无论准确率是 49% 还是 75.3%,它输出的置信度几乎保持不变(0.856-0.937),这反映的是提示词格式而非预测质量。

Second, the LLM exhibits an inverse difficulty effect: accuracy drops to 64.8% when XGBoost is 99% correct, but matches XGBoost (73.8% vs. 73.1%) when it is moderately uncertain.

其次,LLM 表现出一种“反向难度效应”:当 XGBoost 准确率为 99% 时,LLM 的准确率反而下降至 64.8%;但在 XGBoost 表现出中度不确定性时,LLM 的准确率则与 XGBoost 持平(73.8% 对 73.1%)。

Third, few-shot examples and SHAP-derived feature evidence are orthogonal, super-additive interventions: they reduce the Attribution Disagreement Score (ADS) from 1.54 to 0.38 and improve accuracy from 49% to 75.3% without training.

第三,少样本示例(few-shot examples)和基于 SHAP 导出的特征证据是正交且具有超加性(super-additive)的干预手段:它们在无需训练的情况下,将归因分歧得分(ADS)从 1.54 降低至 0.38,并将准确率从 49% 提升至 75.3%。

Fourth, a cross-model calibrator that determined LLM reliability using attribution divergence signals reduces expected calibration error from 0.254 to 0.080, replacing uninformative verbalized confidence with patient-specific reliability estimates, without accessing model internals or requiring repeated inference.

第四,一种利用归因差异信号来确定 LLM 可靠性的跨模型校准器,将预期校准误差(ECE)从 0.254 降低至 0.080。它用针对患者的可靠性估计取代了无意义的口头置信度,且无需访问模型内部参数或进行重复推理。

We frame these findings as a cold start problem for LLMs on structured data and outline a path toward genuine epistemic self-awareness.

我们将这些发现归纳为 LLM 在处理结构化数据时面临的“冷启动”问题,并勾勒出通往真正认知自我意识的路径。