Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

揭示未言之意：通过随机路径聚合可视化大语言模型的隐藏偏见

Abstract: Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static automated metrics. These approaches obscure the underlying probability distributions and fail to capture biases hidden in lower-probability generation branches.

摘要： 大语言模型（LLM）表现出的表征和句法偏见，由于文本生成的随机性，往往难以评估。标准的审计方法通常依赖于单一输出的检查或静态的自动化指标。这些方法掩盖了底层的概率分布，且无法捕捉到隐藏在低概率生成分支中的偏见。

This paper introduces TreeTracer, a visual analytics tool designed to evaluate LLM bias through aggregated comparison. Using a systematic perturbation analysis pipeline, the tool replaces ontology-defined terms in each input prompt, aggregates hundreds of stochastic generations into a syntax-aligned hierarchical structure, and then performs classification-aware node merging with an auxiliary language model. The resulting structure is visualized through a custom Sankey diagram.

本文介绍了 TreeTracer，这是一款旨在通过聚合比较来评估 LLM 偏见的可视化分析工具。该工具利用系统化的扰动分析流程，替换每个输入提示词中本体定义的术语，将数百次随机生成的结果聚合为语法对齐的层次结构，并利用辅助语言模型执行感知分类的节点合并。最终的结构通过定制的桑基图（Sankey diagram）进行可视化展示。

By juxtaposing two ontology-driven trees, the workspace enables direct comparison between semantic contexts and supports systematic bias detection. Because any visualization reflects only a subset of the model’s learned behavior, the system further applies contrastive inference to compute and directly display counterfactual token probabilities across contexts, reducing the risk of misinterpreting the presence of bias.

通过并列展示两棵基于本体的树，该工作区实现了语义上下文之间的直接比较，并支持系统性的偏见检测。由于任何可视化手段都只能反映模型学习行为的一个子集，该系统进一步应用对比推理来计算并直接显示跨上下文的反事实标记概率，从而降低了误读偏见存在的风险。

We validate the workspace through case studies comparing an unaligned baseline model GPT-2 XL against the constitutionally aligned Apertus models. The visual aggregation successfully exposes hidden representational harms, such as counterfactual pronoun suppression and conversational marginalization of individuals. A preliminary user study confirms that the aggregated comparative interface reduces cognitive load and effectively supports analysts in detecting systemic biases.

我们通过案例研究验证了该工作区，对比了未对齐的基准模型 GPT-2 XL 与经过宪法对齐的 Apertus 模型。这种可视化聚合成功揭示了隐藏的表征性伤害，例如反事实代词抑制和对个人的对话边缘化。初步的用户研究证实，这种聚合比较界面降低了认知负荷，并有效地支持分析师检测系统性偏见。