PromptNCE: Pointwise Mutual Information Predictions Using Only LLMs and Contrastive Estimation Prompts
PromptNCE: Pointwise Mutual Information Predictions Using Only LLMs and Contrastive Estimation Prompts
PromptNCE:仅使用大语言模型与对比估计提示词进行点互信息预测
Abstract: Estimating mutual information from text usually requires training a task-specific critic, which limits its use in low-data settings. We ask whether large language models can instead estimate pointwise mutual information zero-shot, using only prompts and elicited probabilities. 摘要: 从文本中估计互信息通常需要训练一个特定任务的判别器(critic),这限制了其在低数据场景下的应用。我们探讨了大语言模型是否能够仅通过提示词和诱导概率,以零样本(zero-shot)的方式估计点互信息(PMI)。
We introduce a benchmark with human-derived ground-truth PMI across three publicly available datasets, and evaluate five information-theoretic prompting-based estimators. Our main method, PromptNCE, frames conditional probability estimation as a contrastive task and augments the candidate set with an explicit OTHER category. 我们引入了一个包含三个公开数据集的人工标注基准,用于验证点互信息的真实值,并评估了五种基于信息论的提示词估计器。我们的核心方法 PromptNCE 将条件概率估计构建为一项对比任务,并通过增加一个明确的“其他”(OTHER)类别来扩充候选集。
We show theoretically that adding OTHER recovers the true conditional P(y | x) rather than just a ranking over listed candidates, turning a contrastive prompt into a general-purpose zero-shot probability estimator. PromptNCE is the best zero-shot method on all three datasets, reaching Spearman correlation up to 0.82 with human-derived PMI. 我们在理论上证明,添加“其他”类别可以恢复真实的条件概率 P(y | x),而不仅仅是获得候选列表的排序,从而将对比提示词转化为一种通用的零样本概率估计器。PromptNCE 在所有三个数据集上均表现为最佳的零样本方法,与人工标注的 PMI 相比,斯皮尔曼相关系数(Spearman correlation)最高可达 0.82。
We also present a case study in computer science education showing how these estimators can be used to score student knowledge summaries in a low-data setting. 我们还展示了一个计算机科学教育领域的案例研究,说明了这些估计器如何在低数据环境下用于评估学生的知识总结。