Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare

Assert, don’t describe: Linguistic features that shift LLM reasoning about animal welfare

断言而非描述：影响大语言模型动物福利推理的语言特征

Abstract: Animal-welfare advocates produce a lot of writing, and increasingly that writing trains the language models that millions of people then ask about animal welfare. Using vocabulary-matched stance-contrast probes on a held-out animal-welfare benchmark, we measure how each of ten linguistic features changes Llama-3.2-1B’s preference for pro-animal-welfare reasoning when used as fine-tuning data.

摘要： 动物福利倡导者撰写了大量文章，而这些文章正越来越多地被用于训练大语言模型，进而被数以百万计的用户用来咨询动物福利相关问题。通过在预留的动物福利基准测试中使用词汇匹配的立场对比探针，我们测量了十种语言特征中的每一种在作为微调数据时，如何改变 Llama-3.2-1B 模型对支持动物福利推理的偏好。

Eight of the ten features produce statistically significant shifts. Seven move the model toward stronger pro-animal-welfare reasoning: assertive certainty, explicit moral vocabulary, emotion words, evaluative claims, narrative structure, depicted harm severity, and immediate temporal framing. Two move it the other way: hedged language and concrete sensory description both dilute the pro-animal-welfare stance. First-person perspective has no statistically significant effect.

在这十种特征中，有八种产生了统计学上的显著变化。其中七种特征促使模型向更强烈的支持动物福利推理方向转变：断言式的确定性、明确的道德词汇、情感词汇、评价性主张、叙事结构、所描述的伤害严重程度以及即时的时态框架。有两种特征则产生了相反的效果：模棱两可的语言（hedged language）和具体的感官描述都会削弱支持动物福利的立场。第一人称视角则没有产生统计学上的显著影响。

The practical recommendation for anyone writing animal-welfare text that may end up in LLM training corpora: assert a position rather than describe a scene neutrally. The features that shift the model are the ones that make the writer’s position explicit; the features that dilute it hold animal-welfare content but withhold stance.

对于任何撰写可能进入大语言模型训练语料库的动物福利文本的人，我们的实践建议是：明确断言立场，而不是中立地描述场景。那些能够改变模型倾向的特征，正是那些使作者立场明确的特征；而那些削弱立场的特征，虽然包含了动物福利的内容，却隐瞒了作者的立场。