When Should a Language Model Trust Itself? Same-Model Self-Verification as a Conditional Confidence Signal

When Should a Language Model Trust Itself? Same-Model Self-Verification as a Conditional Confidence Signal

语言模型何时该信任自己?作为条件置信度信号的同模型自我验证

Abstract: Same-model self-verification, prompting a model to audit its own predicted answer, is a plausible confidence signal for selective prediction, but its practical value remains unclear once strong likelihood-based baselines are taken seriously. 摘要: “同模型自我验证”(Same-model self-verification)——即提示模型审查其自身预测的答案——是选择性预测中一种合理的置信度信号。然而,一旦考虑到基于似然(likelihood-based)的强基准模型,其应用价值仍不明确。

We evaluate self-verification against two such baselines, LL-AVG and LL-SUM, on ARC-Challenge and TruthfulQA-MC across multiple model families, scales, and prompt variants. We measure not only correctness ranking, but also abstention quality through AURC and operating-point analyses. 我们针对 ARC-Challenge 和 TruthfulQA-MC 数据集,在多个模型系列、规模和提示词变体中,将自我验证与 LL-AVG 和 LL-SUM 这两个基准进行了对比评估。我们不仅测量了正确性排序,还通过 AURC(拒绝曲线下面积)和工作点分析评估了拒绝回答的质量。

The results are sharply task- and model-dependent. On ARC-Challenge, self-verification substantially improves over LL-AVG for Phi-2 and the Qwen models, with the largest gains appearing in Qwen-7B. 结果显示,该方法具有极强的任务和模型依赖性。在 ARC-Challenge 上,对于 Phi-2 和 Qwen 模型,自我验证相比 LL-AVG 有显著提升,其中 Qwen-7B 的增益最为明显。

On TruthfulQA-MC, however, the signal is less reliable: smaller models can become prompt-sensitive, DeepSeek-R1-Distill-8B degrades relative to LL-AVG, and LL-SUM often remains the stronger practical baseline. 然而,在 TruthfulQA-MC 上,该信号的可靠性较低:较小的模型可能会对提示词变得敏感,DeepSeek-R1-Distill-8B 的表现相较于 LL-AVG 有所下降,且 LL-SUM 通常仍是更具实用性的基准。

We therefore do not treat self-verification as a general-purpose uncertainty estimator. In this setting, it is better understood as a conditional confidence signal whose value depends on task type, model family, prompt formulation, and, crucially, the baseline it must beat. 因此,我们并不将自我验证视为一种通用的不确定性估计器。在这种背景下,将其理解为一种“条件置信度信号”更为准确,其价值取决于任务类型、模型系列、提示词构建方式,以及最关键的——它所需要超越的基准模型。