Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas
Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas
通过伦理困境对大语言模型进行亚里士多德式美德画像
Abstract: Large Language Models (LLMs) often face ethical tradeoffs in which several responses may be defensible but express different priorities, such as fairness, honesty, courage, or restraint. We introduce VirtueMap, a framework for describing these patterns through an Aristotelian virtue-ethics lens.
摘要: 大语言模型(LLMs)经常面临伦理权衡,其中多种回答可能都站得住脚,但体现了不同的优先事项,例如公平、诚实、勇气或节制。我们引入了 VirtueMap,这是一个通过亚里士多德美德伦理视角来描述这些模式的框架。
Instead of asking for a single correct answer, VirtueMap asks humans or LLMs to rank all five responses to each of seven general, non-lethal, non-political, and non-religious ethical dilemmas. To define the reference orderings used for scoring, we first proposed, for each dilemma and virtue, an ordering of the five responses from most to least expressive of that virtue.
VirtueMap 不要求单一的正确答案,而是要求人类或大语言模型对七个通用、非致命、非政治且非宗教的伦理困境中的五个回答进行排序。为了定义用于评分的参考排序,我们首先针对每个困境和美德,提出了五个回答从“最能体现该美德”到“最不能体现该美德”的排序。
We then collected more than 100 respondent evaluations per ordering and retained it as operational ground truth only when at least 95% confirmed it. Rankings are scored against these retained orderings using normalized Borda alignment, yielding profiles over Practical Wisdom, Justice, Truthfulness, Courage, and Temperance.
随后,我们为每个排序收集了超过 100 份受访者评估,只有当至少 95% 的受访者确认时,才将其保留为操作性基准真值(ground truth)。排名通过归一化 Borda 对齐方式与这些保留的排序进行比对评分,从而得出在实践智慧、正义、诚实、勇气和节制方面的画像。
We apply VirtueMap to nine LLM families in a repeated-run evaluation and find high mean rank consistency (90.3%), with the largest differences appearing on Courage, Temperance, and Justice. We also release an interactive website that computes profiles locally in the browser and compares respondents with measured LLM profiles.
我们将 VirtueMap 应用于九个大语言模型家族进行重复运行评估,发现平均排名一致性很高(90.3%),其中最大的差异出现在勇气、节制和正义方面。我们还发布了一个交互式网站,可以在浏览器中本地计算画像,并将受访者与测量出的大语言模型画像进行比较。