Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
大语言模型准备好进行冲突监测了吗?来自西非的实证研究
Abstract: As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification.
摘要: 随着大语言模型(LLM)进入冲突监测领域,理解其输出中的系统性偏差对于人道主义问责至关重要。我们评估了四种原生开源权重模型(Gemma 3 4B、Llama 3.2 3B、Mistral 7B 和 OLMo 2 7B)以及两种领域自适应模型(AfroConfliBERT 和 AfroConfliLLAMA),针对尼日利亚和喀麦隆的冲突事件分类任务进行了测试,并将其与经过多阶段验证的黄金标准数据集 ACLED 进行了对比。
We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero.
研究发现,模型在规范导向上存在分歧。开源权重模型表现出统计学上显著的“错误非法化”(False Illegitimation)偏见:Gemma 将 18.29% 的合法战斗误分类为针对平民的暴力行为,而“错误合法化”错误率为零。相比之下,AfroConfliBERT 和 AfroConfliLLAMA 实现了近乎中立的导向,其合法化偏见差异在统计上几乎为零。
Yet domain adaptation does not eliminate actor-based selection bias. Both adapted models show statistically significant actor bias comparable to vanilla LLMs; in Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Open-weight outputs are also fragile to geography-specific lexical framing: delegitimizing phrases produce flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, while perturbations salient in one context may not matter in another.
然而,领域自适应并不能消除基于行为体的选择性偏见。两种自适应模型均表现出与原生 LLM 相当的、统计学上显著的行为体偏见;在尼日利亚,在相同的战术背景下,国家行为体被合法化的频率比非国家行为体高出 36.5%。开源权重模型的输出对特定地理区域的词汇框架也非常敏感:在喀麦隆,去合法化措辞导致的分类翻转率高达 66.7%,在尼日利亚为 34.2%,且在一种语境下显著的扰动在另一种语境下可能并不重要。
Error trace profiling shows models mask normative bias through unfaithful rationale confabulations. In contrast, AfroConfliBERT and AfroConfliLLAMA are largely robust, with near-zero flip rates across perturbation categories. Overall, current models are not ready for unsupervised deployment in conflict monitoring. We call for fairness-aware fine-tuning to reduce actor-based selection bias, mandatory adversarial robustness evaluation against lexical manipulation, and context-specific human-in-the-loop oversight calibrated to regional difficulty.
错误追踪分析显示,模型通过不可靠的推理编造来掩盖其规范性偏见。相比之下,AfroConfliBERT 和 AfroConfliLLAMA 表现出极强的鲁棒性,在各类扰动下的分类翻转率几乎为零。总体而言,目前的模型尚未准备好在冲突监测中进行无人值守部署。我们呼吁进行公平性感知微调以减少基于行为体的选择性偏见,强制执行针对词汇操纵的对抗性鲁棒性评估,并建立根据区域难度校准的、特定于情境的人机协同监督机制。