Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

大语言模型准备好进行冲突监测了吗？来自西非的实证研究

Abstract: As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification.

摘要： 随着大语言模型（LLM）进入冲突监测领域，理解其输出中的系统性偏差对于人道主义问责至关重要。我们评估了四种原生开源权重模型（Gemma 3 4B、Llama 3.2 3B、Mistral 7B 和 OLMo 2 7B）以及两种领域自适应模型（AfroConfliBERT 和 AfroConfliLLAMA），针对尼日利亚和喀麦隆的冲突事件分类任务进行了测试，并将其与经过多阶段验证的黄金标准数据集 ACLED 进行了对比。

We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero.

研究发现，模型在规范导向上存在分歧。开源权重模型表现出统计学上显著的“错误非法化”（False Illegitimation）偏见：Gemma 将 18.29% 的合法战斗误分类为针对平民的暴力行为，而“错误合法化”错误率为零。相比之下，AfroConfliBERT 和 AfroConfliLLAMA 实现了近乎中立的导向，其合法化偏见差异在统计上几乎为零。

Yet domain adaptation does not eliminate actor-based selection bias. Both adapted models show statistically significant actor bias comparable to vanilla LLMs; in Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Open-weight outputs are also fragile to geography-specific lexical framing: delegitimizing phrases produce flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, while perturbations salient in one context may not matter in another.

然而，领域自适应并不能消除基于行为体的选择性偏见。两种自适应模型均表现出与原生 LLM 相当的、统计学上显著的行为体偏见；在尼日利亚，在相同的战术背景下，国家行为体被合法化的频率比非国家行为体高出 36.5%。开源权重模型的输出对特定地理区域的词汇框架也非常敏感：在喀麦隆，去合法化措辞导致的分类翻转率高达 66.7%，在尼日利亚为 34.2%，且在一种语境下显著的扰动在另一种语境下可能并不重要。

Error trace profiling shows models mask normative bias through unfaithful rationale confabulations. In contrast, AfroConfliBERT and AfroConfliLLAMA are largely robust, with near-zero flip rates across perturbation categories. Overall, current models are not ready for unsupervised deployment in conflict monitoring. We call for fairness-aware fine-tuning to reduce actor-based selection bias, mandatory adversarial robustness evaluation against lexical manipulation, and context-specific human-in-the-loop oversight calibrated to regional difficulty.

错误追踪分析显示，模型通过不可靠的推理编造来掩盖其规范性偏见。相比之下，AfroConfliBERT 和 AfroConfliLLAMA 表现出极强的鲁棒性，在各类扰动下的分类翻转率几乎为零。总体而言，目前的模型尚未准备好在冲突监测中进行无人值守部署。我们呼吁进行公平性感知微调以减少基于行为体的选择性偏见，强制执行针对词汇操纵的对抗性鲁棒性评估，并建立根据区域难度校准的、特定于情境的人机协同监督机制。