Probabilistic Attribution For Large Language Models

大型语言模型的概率归因

Abstract: The generative nature of Large Language Models (LLMs) is reflected in the conditional probabilities they compute to sample each response token given the previous tokens. These probabilities encode the distributional structure that the model learns in training and exploits in inference.

摘要： 大型语言模型（LLM）的生成特性反映在其计算的条件概率中，即在给定先前标记（token）的情况下对每个响应标记进行采样。这些概率编码了模型在训练中学习并在推理中利用的分布结构。

In this work, we use these probabilities to situate LLMs within the mathematical theory of stochastic processes. We use this framework to design a model-agnostic probabilistic token attribution measure, using Bayes rule to invert the next-token log-probabilities so as to capture the models internal representation of the distribution over token sequences. The representation is independent of the models computational structure.

在这项工作中，我们利用这些概率将大型语言模型置于随机过程的数学理论框架内。我们利用该框架设计了一种与模型无关的概率标记归因度量，通过贝叶斯法则反转下一个标记的对数概率，从而捕捉模型对标记序列分布的内部表示。这种表示独立于模型的计算结构。

This representation yields the conditional probability of the response given the prompt, and of the response given the prompt with a token marginalized away. Our attribution score is the log of the ratio of these probabilities. We further compute the entropies of a single prompts token distributions, conditioned on the remaining context. The interplay between entropy and attribution score sheds light on LLM behavior.

这种表示得出了在给定提示词下响应的条件概率，以及在剔除某个标记后给定提示词下响应的条件概率。我们的归因分数是这些概率比值的对数。此外，我们还计算了在剩余上下文条件下，单个提示词标记分布的熵。熵与归因分数之间的相互作用揭示了大型语言模型的行为机制。

We evaluate 8 models across 7 prompts and investigate anomalies, token sensitivity, response stability, model stability, and training convergence, thereby improving interpretability and guiding users to focus on uncertain or unstable parts of the generation.

我们评估了 7 个提示词下的 8 个模型，并研究了异常情况、标记敏感性、响应稳定性、模型稳定性和训练收敛性，从而提高了可解释性，并引导用户关注生成过程中不确定或不稳定的部分。

Paper Details:

Authors: Shilpika Shilpika, Carlo Graziani, Bethany Lusch, Venkatram Vishwanath, Michael E. Papka
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
arXiv ID: 2605.21726

论文详情：

作者： Shilpika Shilpika, Carlo Graziani, Bethany Lusch, Venkatram Vishwanath, Michael E. Papka
学科： 计算与语言 (cs.CL)；人工智能 (cs.AI)
arXiv ID： 2605.21726