These LLMs are the best at resisting Russian propaganda
These LLMs are the best at resisting Russian propaganda
哪些大语言模型最能抵御俄罗斯宣传?
As more people rely on large language models to provide pat answers to complex questions, state governments are understandably worried about those LLMs spouting what they see as dangerous propaganda promoted by foreign adversaries. 随着越来越多的人依赖大语言模型(LLM)来获取复杂问题的现成答案,各国政府难免会担心这些模型会散布他们眼中由外国敌对势力推动的危险宣传。
To help combat this problem, the government-sponsored Estonian Language Institute (ELI) has released a new “Propaganda Resistance” benchmark ranking dozens of LLMs on their ability to avoid “tak[ing] positions on topics that the Russian Federation uses in its strategic narratives.” 为了应对这一问题,由政府资助的爱沙尼亚语言研究所(ELI)发布了一项新的“宣传抵御力”基准测试,对数十个大语言模型进行了排名,评估它们在避免“对俄罗斯联邦战略叙事中所涉及的话题采取立场”方面的能力。
As a former member of the Soviet Union that has been independent for just a few decades, many Estonians are particularly alert to what they see as false narratives being promoted from their large and often belligerent neighbor to the east. 作为前苏联加盟共和国,爱沙尼亚独立仅几十年,许多爱沙尼亚人对来自东方那个庞大且常表现出好战姿态的邻国所散布的虚假叙事保持着高度警惕。
Alongside volunteer-run Estonian defense collective Propastop, the ELI identified 14 broad categories in which it sees Russian influence operations trying to sway public discussion. These range from narratives on the current status of Crimea and justifications for the war in Ukraine to the history of NATO and justification for Russia’s annexation of Baltic states during World War II. ELI 与志愿者运营的爱沙尼亚防御组织 Propastop 合作,确定了 14 个俄罗斯影响力行动试图左右公众讨论的广泛领域。这些领域涵盖了从克里米亚现状、乌克兰战争的合理性,到北约历史以及二战期间俄罗斯吞并波罗的海国家等议题。
For each category of propaganda, the researchers developed separate questions phrased to be neutral, biased with “false assumptions” based on Russian propaganda, or to maliciously attempt to elicit explicit misinformation from the LLM. Questions were provided to the models in English, Estonian, and Russian, and judged by a separate AI model (calibrated to align with Propastop experts) based on the models’ ability to “push back on propaganda narratives, without external help” from web search or other external tools. 针对每一类宣传,研究人员设计了不同的问题,有的表述中立,有的带有基于俄罗斯宣传的“虚假假设”偏见,还有的则是恶意诱导模型输出明确的虚假信息。这些问题分别以英语、爱沙尼亚语和俄语提供给模型,并由另一个经过 Propastop 专家校准的 AI 模型进行评估,重点考察模型在“不借助网络搜索或其他外部工具的情况下,抵制宣传叙事”的能力。
The rankings
排名情况
Anthropic’s Claude models tended to perform the best of the proprietary frontier models on this new benchmark, with various recent versions of its Sonnet and Opus models taking six of the top 10 spots. Opus 4.7, the best-performing model overall, received a top-rated “Exemplary” mark for its response on a full 77 percent of questions (and a middling “mediocre” on just 2 percent) for a mean final score of 94.9 out of 100 on the benchmark. 在这一新基准测试中,Anthropic 的 Claude 系列模型在专有前沿模型中表现最为出色,其 Sonnet 和 Opus 的多个近期版本占据了前 10 名中的 6 个席位。表现最好的 Opus 4.7 模型在 77% 的问题中获得了最高评价“模范”(Exemplary),仅有 2% 的问题被评为“平庸”(mediocre),最终基准测试平均分为 94.9 分(满分 100 分)。
Open-weight models, including Nvidia’s Nemotron and Alibaba’s Qwen, showed strong results comparable to Anthropic’s best models. GPT-5.4—the best-performing model from OpenAI—also performed relatively well on the benchmark, providing “Exemplary” responses on 54 percent of questions and achieving an 88.9 mean score. 包括英伟达的 Nemotron 和阿里巴巴的通义千问(Qwen)在内的开源权重模型也表现强劲,与 Anthropic 的顶级模型不相上下。OpenAI 表现最好的模型 GPT-5.4 在基准测试中也表现良好,在 54% 的问题中给出了“模范”回答,平均分为 88.9 分。
Unsurprisingly, recent frontier models showed a much stronger tendency to resist Russian propaganda than models from just a few years ago. Claude 3.5 Haiku—the highest-rated model released in 2024—received a mean rating of just 73.1 on the benchmark. That mark would put it in the bottom third of models released in 2026 on this metric. 不出所料,近期发布的前沿模型在抵御俄罗斯宣传方面的表现远优于几年前的模型。2024 年发布的评分最高的模型 Claude 3.5 Haiku 在基准测试中的平均分仅为 73.1 分。按照这一指标,它在 2026 年发布的模型中将处于后三分之一的位置。
But that improvement over time was not uniform across all LLM makers. Google’s most propaganda-resistant LLM, Gemini 2.5 Pro, is nearly a year old now and has only reached a mean score of 82 on the benchmark, largely due to a particular susceptibility to maliciously worded prompts. The most recent tested Google model, Gemini 3.5 Flash, only scored a 73 on the benchmark, comparable to Anthropic models released nearly two years ago. 然而,这种进步在不同大模型厂商之间并不均衡。谷歌最能抵御宣传的模型 Gemini 2.5 Pro 问世已近一年,但在基准测试中平均分仅达到 82 分,这主要是因为它对恶意措辞的提示词特别敏感。谷歌最近测试的模型 Gemini 3.5 Flash 在基准测试中仅得 73 分,表现与 Anthropic 近两年前发布的模型相当。
In a supporting post on the Propastop blog, the organization highlights how many models showed much less resistance to Russian propaganda when questioned in Russian. Google’s Gemini 3.5 Flash received significantly lower benchmark scores in Russian than in English, as did open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash. 在 Propastop 博客的一篇配套文章中,该组织强调了许多模型在用俄语提问时,对俄罗斯宣传的抵御能力明显下降。谷歌的 Gemini 3.5 Flash 在俄语测试中的得分明显低于英语测试,月之暗面的 Kimi K2 和阶跃星辰的 Step 3.5 Flash 等开源权重模型也出现了同样的情况。
What one country sees as propaganda, of course, another might see as a set of important cultural truths that LLMs should support and reflect. A recent study from King’s College professor Gregory Asmolov analyzes how the Russian government—through recent technical alliances with other BRICS countries—is seeking to influence AI models by projecting specific sociopolitical positions that are “culturally sensitive” to Russia’s viewpoints. 当然,一个国家眼中的宣传,在另一个国家看来可能是一系列重要的文化真理,认为大语言模型应该支持并反映这些真理。伦敦国王学院教授 Gregory Asmolov 最近的一项研究分析了俄罗斯政府如何通过近期与其他金砖国家的技术联盟,试图通过投射对俄罗斯观点具有“文化敏感性”的特定社会政治立场来影响 AI 模型。