Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

心理效力强大，计算上不可见：大语言模型生成的社会比较触发因素难以被自身检测

Abstract: We introduce Xiaohongshu Social Comparison Reader Elicitation (XHS-SCoRE), a reader-grounded benchmark for detecting if a text-only Xiaohongshu (RedNote) post elicits UPWARD, DOWNWARD, or NEUTRAL/no clear social comparison from a first-person reader perspective.

摘要： 我们引入了“小红书社会比较读者诱导”（XHS-SCoRE）基准，这是一个基于读者视角的评估标准，旨在检测纯文本的小红书帖子是否会从第一人称读者的角度引发“向上”、“向下”或“中性/无明确”的社会比较。

The task targets a socially meaningful relational signal that is behaviorally real yet not reducible to sentiment. Across prompted LLM classifiers and supervised Chinese encoder baselines, we find a consistent mismatch between generation fluency and reliable detection ability: the signal is textually learnable in-domain, but not robustly accessible to prompt-based classification.

该任务针对的是一种具有社会意义的关系信号，这种信号在行为上是真实的，但不能简单地归结为情感分析。通过对比提示词驱动的大语言模型分类器和监督式中文编码器基线，我们发现生成流畅度与可靠检测能力之间存在持续的不匹配：该信号在领域内是可学习的，但对于基于提示词的分类方法而言，却难以稳健地获取。

Prompted LLM classifiers exhibit stable, interpretable failure modes, especially neutralization of comparison-triggering posts and model-specific directional skew. A controlled pilot further shows that LLM-generated Xiaohongshu-style posts can shift perceived standing and comparison-related affect even when prompt-based detection of the same construct remains fragile.

提示词驱动的大语言模型分类器表现出稳定且可解释的失效模式，特别是对具有比较触发性质的帖子进行“中性化”处理，以及模型特有的方向性偏差。一项受控试点研究进一步表明，即使在基于提示词的检测方法对同一构念的识别仍然脆弱的情况下，大语言模型生成的小红书风格帖子仍能改变读者的感知地位和与比较相关的情感。

XHS-SCoRE contributes both a benchmark for reader-grounded comparison detection and a diagnostic framework for studying when socially meaningful relational cues remain only partially visible to prompt-based inference.

XHS-SCoRE 不仅为基于读者视角的比较检测提供了一个基准，还提供了一个诊断框架，用于研究在何种情况下，具有社会意义的关系线索对于基于提示词的推理而言仅是部分可见的。