Topics as Proxies for Sociodemographics: How Conversational Context Affects LLM Answers

主题作为社会人口统计学的代理：对话语境如何影响大语言模型的回答

Abstract: When large language models (LLMs) are used in high-stakes scenarios, such as legal, medical and financial advice, even a single conversation history is enough to drive differences in outcomes between users. Prior work has demonstrated that this results in outcome disparities between sociodemographic groups, with some groups receiving more advantageous outcomes than others.

摘要： 当大语言模型（LLM）被用于法律、医疗和金融咨询等高风险场景时，仅仅一段对话历史就足以导致不同用户之间产生结果差异。先前的研究表明，这会导致不同社会人口统计学群体之间的结果不平等，某些群体获得的结果比其他群体更有利。

In this work, we demonstrate that LLMs actually struggle to infer user sociodemographics from a single conversation history and that although there are disparities between sociodemographic groups, they are minimal in magnitude. To investigate what the main driver of these disparities is, we compare user sociodemographics to a range of (psycho)linguistic features of conversations, including conversation topic, emotions, and readability.

在这项工作中，我们证明了 LLM 实际上很难仅从一段对话历史中推断出用户的社会人口统计学特征；尽管不同社会人口统计学群体之间存在差异，但其幅度微乎其微。为了探究这些差异的主要驱动因素，我们将用户的社会人口统计学特征与一系列对话的（心理）语言学特征进行了比较，包括对话主题、情绪和可读性。

We find that conversation topics are most predictive of LLM-generated advice within a conversational context, which, to some extent, function as proxies for sociodemographic groups and often affect advice in unpredictable ways. This is cause for concern and highlights the need for future research to better understand and, if needed, mitigate the effect of conversational context on LLM outputs in high-stakes scenarios.

我们发现，在对话语境中，对话主题对 LLM 生成的建议最具预测性。在某种程度上，这些主题充当了社会人口统计学群体的“代理”，并经常以不可预测的方式影响建议。这引起了我们的担忧，并凸显了未来研究的必要性，以便更好地理解并在必要时减轻对话语境对高风险场景中 LLM 输出的影响。