A Definition of Good Explanations and the Challenges Explaining LLM Outputs

关于“好的解释”的定义及解释大语言模型输出所面临的挑战

Abstract: How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are.

摘要： 如何定义“好的解释”是一个长期存在的哲学争论，而在人工智能输出的背景下，这一议题近期重新引起了人们的关注。可解释性对于人工智能在许多场景下的应用至关重要，但为了对人工智能系统做出良好的解释，我们首先必须理解什么是“好的解释”。

In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor’s prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.

在本文中，我们提出了一种受“反事实解释”概念启发的定义。然而，我们认为，在构建解释时，还必须考虑到对话者对解释中可能提供的每一个事实所持有的先验信念。我们探讨了该定义对人工智能可解释性的影响，并特别分析了为何大语言模型（LLM）的输出难以给出良好的解释。

Paper Details:

Authors: Louis Mahon, Elliot Ford, Callum Hackett
Subject: Artificial Intelligence (cs.AI)
arXiv ID: 2606.14838
Submission Date: 12 Jun 2026

论文详情：

作者： Louis Mahon, Elliot Ford, Callum Hackett
学科： 人工智能 (cs.AI)
arXiv 编号： 2606.14838
提交日期： 2026年6月12日