Influential study touting ChatGPT in education retracted over red flags

一项吹捧 ChatGPT 在教育领域作用的重磅研究因存在严重问题被撤稿

A study that claimed OpenAI’s ChatGPT can positively impact student learning has been retracted nearly one year after publication. The journal publisher, Springer Nature, cited “discrepancies” in the analysis and a lack of confidence in the conclusions—but not before the paper racked up hundreds of citations and made the rounds on social media.

一项声称 OpenAI 的 ChatGPT 能对学生学习产生积极影响的研究，在发表近一年后被撤稿。期刊出版商施普林格·自然（Springer Nature）指出，该研究的分析存在“差异”，且对结论缺乏信心——但在撤稿之前，这篇论文已经获得了数百次引用，并在社交媒体上广为流传。

“The paper’s authors made some very attention-grabbing claims about the benefits of ChatGPT on learning outcomes,” said Ben Williamson, a senior lecturer at the Centre for Research in Digital Education and the Edinburgh Futures Institute at the University of Edinburgh in Scotland, in an email to Ars. “It was treated by many on social media as one of the first pieces of hard, gold standard evidence that ChatGPT, and generative AI more broadly, broadly, benefits learners.”

“论文作者就 ChatGPT 对学习成果的益处提出了一些非常引人注目的主张，”苏格兰爱丁堡大学数字教育研究中心及爱丁堡未来研究所的高级讲师本·威廉姆森（Ben Williamson）在给 Ars 的电子邮件中表示。“它被社交媒体上的许多人视为 ChatGPT 以及更广泛的生成式 AI 有益于学习者的首批硬核、黄金标准证据之一。”

The retracted paper attempted to quantify “the effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking” by analyzing results from 51 previous research studies. Its meta-analysis calculated the effect size between various studies’ experimental groups that used ChatGPT in education and control groups that did not use the AI chatbot. That analysis supposedly showed how “ChatGPT has a large positive impact on improving learning performance” along with a “moderately positive impact on enhancing learning perception” and “fostering higher-order thinking,” according to the researchers who authored the paper.

这篇被撤回的论文试图通过分析此前 51 项研究的结果，来量化“ChatGPT 对学生学习表现、学习感知和高阶思维的影响”。其元分析计算了各研究中在教育中使用 ChatGPT 的实验组与未使用该 AI 聊天机器人的对照组之间的效应量。据论文作者称，该分析据称显示了“ChatGPT 在提高学习表现方面具有巨大的积极影响”，同时在“增强学习感知”和“培养高阶思维”方面具有“中等程度的积极影响”。

The now-retracted results first appeared in the journal Humanities & Social Sciences Communications, published by Springer Nature on May 6, 2025. “In some cases it appears it was synthesizing very poor quality studies, or mixing together findings from studies that simply cannot be accurately compared due to very different methods, populations, and samples,” Williamson told Ars. “It really seemed like a paper that should not have been published in the first place.”

这些现已被撤回的研究结果最初于 2025 年 5 月 6 日发表在施普林格·自然旗下的《人文与社会科学通讯》（Humanities & Social Sciences Communications）期刊上。“在某些情况下，它似乎是在综合质量非常低的研究，或者将那些由于方法、人群和样本差异巨大而根本无法准确比较的研究结果混为一谈，”威廉姆森告诉 Ars。“这篇论文从一开始就不应该被发表。”

Williamson also questioned the timing of the paper’s publication just two and a half years after OpenAI released ChatGPT in November 2022. “It is not feasible that dozens of high-quality studies about ChatGPT and learning performance could have been conducted, reviewed, and published in that time,” Williamson said.

威廉姆森还质疑了该论文的发表时间，因为距离 OpenAI 于 2022 年 11 月发布 ChatGPT 仅过去了两年半。“在那么短的时间内，不可能完成、评审并发表数十项关于 ChatGPT 与学习表现的高质量研究，”威廉姆森说。

A legacy that may outlive retraction

可能比撤稿影响更深远的“遗产”

Since its publication, the study has been cited 262 times in other papers published by Springer Nature’s peer-reviewed journals and received a total of 504 citations from both peer-reviewed and non-peer-reviewed sources. It also attracted nearly half a million readers and received enough online attention to rank in the 99th percentile for journal articles in terms of attention score.

自发表以来，该研究在施普林格·自然旗下同行评审期刊发表的其他论文中被引用了 262 次，从同行评审和非同行评审来源处共获得了 504 次引用。它还吸引了近 50 万名读者，并获得了足够的在线关注度，其关注度得分在期刊文章中排名前 1%（第 99 百分位）。

“Of course, the problem with this form of social media circulation is that all of the details about the study got stripped away,” Williamson said. “All that was left were the major claims, which certain social media users helped boost and propel. All this helped the paper get a huge amount of attention, even though the findings really were not supported by the underlying research at all.”

“当然，这种社交媒体传播方式的问题在于，关于研究的所有细节都被剥离了，”威廉姆森说。“剩下的只有主要论点，而某些社交媒体用户又推波助澜。所有这些都帮助这篇论文获得了巨大的关注，尽管其结论根本没有得到底层研究的支持。”

Williamson has not been alone in such concerns. When the paper was first published, Ilkka Tuomi, chief scientist of the research institute Meaning Processing Ltd., posted on LinkedIn about the pitfalls of such meta-analysis studies attempting to “draw conclusions about incompatible and ill-defined outcomes” from experimental results involving very different populations. “The only reason to do these studies seems to be that statistics and meta-analysis tools can crunch out numbers that look [like] science,” Tuomi wrote.

持有这种担忧的不止威廉姆森一人。当该论文首次发表时，研究机构 Meaning Processing Ltd. 的首席科学家伊尔卡·图奥米（Ilkka Tuomi）在 LinkedIn 上发文，指出了此类元分析研究的陷阱，即试图从涉及截然不同人群的实验结果中“对不兼容且定义模糊的结果得出结论”。“做这些研究的唯一原因似乎是统计学和元分析工具可以计算出看起来‘像科学’的数字，”图奥米写道。

On April 22, 2026, Springer Nature posted a retracted article notice almost a year after initial publication. The journal publisher also stated that “the authors had not responded to correspondence regarding the retraction.” “The Editor has decided to retract this paper owing to concerns regarding discrepancies in the meta-analysis,” said the Springer Nature retraction note. “These issues ultimately undermine the confidence the Editor can place in the validity of the analysis and resulting conclusions.”

2026 年 4 月 22 日，在最初发表近一年后，施普林格·自然发布了撤稿通知。该期刊出版商还表示，“作者未回复有关撤稿的信函”。“编辑决定撤回这篇论文，原因是担心元分析中存在差异，”施普林格·自然的撤稿说明中写道。“这些问题最终削弱了编辑对分析有效性及所得结论的信心。”

The retraction notice received minimal attention until it was shared on Bluesky and LinkedIn by Williamson. He expressed concern that many researchers and others who initially read the paper will not realize it was retracted, meaning that the “headline finding that ChatGPT helps learning performance might persist despite its retraction.”

撤稿通知在被威廉姆森分享到 Bluesky 和 LinkedIn 之前几乎没有引起关注。他表示担心，许多最初阅读过该论文的研究人员和其他人可能不会意识到它已被撤回，这意味着“ChatGPT 有助于提高学习表现这一头条结论，尽管已被撤稿，却可能依然存在”。

“All of this is hugely frustrating for those of us trying hard to make sense of what AI means for learning, teaching, and education more generally,” Williamson told Ars. “We have had several years of hype about AI in education, but what we have really needed is high-quality research that can actually show us what kinds of impacts AI is having in classrooms and learning practices.”

“对于我们这些努力试图理解 AI 对学习、教学以及更广泛的教育意味着什么的人来说，这一切都令人非常沮丧，”威廉姆森告诉 Ars。“我们已经经历了几年关于 AI 教育的炒作，但我们真正需要的是高质量的研究，能够真正向我们展示 AI 在课堂和学习实践中到底产生了什么样的影响。”

Many educators have scrambled to adapt their classes to prevent AI-enabled student cheating and expressed discouragement with how widely available generative AI tools have shifted many students’ mindsets away from learning and critical thinking. Tech companies continue to promote AI chatbot tools as “study mode” learning tools and for generating SAT practice tests. Meanwhile, at least one country is pivoting away from digital resources by reintroducing physical books and having students return to using pen and paper.

许多教育工作者忙于调整课程以防止学生利用 AI 作弊，并对生成式 AI 工具的广泛普及如何使许多学生的思维偏离学习和批判性思考表示灰心。科技公司继续将 AI 聊天机器人工具宣传为“学习模式”工具，并用于生成 SAT 模拟试题。与此同时，至少有一个国家正在转向，重新引入纸质书籍，让学生回归使用纸笔学习。