Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare
Assert, don’t describe: Linguistic features that shift LLM reasoning about animal welfare
断言而非描述:影响大语言模型动物福利推理的语言特征
Abstract: Animal-welfare advocates produce a lot of writing, and increasingly that writing trains the language models that millions of people then ask about animal welfare. Using vocabulary-matched stance-contrast probes on a held-out animal-welfare benchmark, we measure how each of ten linguistic features changes Llama-3.2-1B’s preference for pro-animal-welfare reasoning when used as fine-tuning data.
摘要: 动物福利倡导者撰写了大量文章,而这些文章正越来越多地被用于训练大语言模型,进而被数以百万计的用户用来咨询动物福利相关问题。通过在预留的动物福利基准测试中使用词汇匹配的立场对比探针,我们测量了十种语言特征中的每一种在作为微调数据时,如何改变 Llama-3.2-1B 模型对支持动物福利推理的偏好。
Eight of the ten features produce statistically significant shifts. Seven move the model toward stronger pro-animal-welfare reasoning: assertive certainty, explicit moral vocabulary, emotion words, evaluative claims, narrative structure, depicted harm severity, and immediate temporal framing. Two move it the other way: hedged language and concrete sensory description both dilute the pro-animal-welfare stance. First-person perspective has no statistically significant effect.
在这十种特征中,有八种产生了统计学上的显著变化。其中七种特征促使模型向更强烈的支持动物福利推理方向转变:断言式的确定性、明确的道德词汇、情感词汇、评价性主张、叙事结构、所描述的伤害严重程度以及即时的时态框架。有两种特征则产生了相反的效果:模棱两可的语言(hedged language)和具体的感官描述都会削弱支持动物福利的立场。第一人称视角则没有产生统计学上的显著影响。
The practical recommendation for anyone writing animal-welfare text that may end up in LLM training corpora: assert a position rather than describe a scene neutrally. The features that shift the model are the ones that make the writer’s position explicit; the features that dilute it hold animal-welfare content but withhold stance.
对于任何撰写可能进入大语言模型训练语料库的动物福利文本的人,我们的实践建议是:明确断言立场,而不是中立地描述场景。那些能够改变模型倾向的特征,正是那些使作者立场明确的特征;而那些削弱立场的特征,虽然包含了动物福利的内容,却隐瞒了作者的立场。