Research repository ArXiv will ban authors for a year if they let AI do all the work

Research repository ArXiv will ban authors for a year if they let AI do all the work

研究资料库 ArXiv:若作者完全依赖 AI 生成论文,将被封禁一年

ArXiv, a widely used open repository for preprint research, is doing more to crack down on the careless use of large language models in scientific papers. Although papers are posted to the site before they are peer-reviewed, arXiv (pronounced “archive”) has become one of the main ways that research circulates in fields like computer science and math, and the site itself has become a source of data on trends in scientific research.

ArXiv 是一个广泛使用的预印本研究开放资料库,目前正采取更多措施,打击在科学论文中草率使用大语言模型的行为。尽管论文在经过同行评审之前就会发布在该网站上,但 ArXiv(发音为“archive”)已成为计算机科学和数学等领域研究传播的主要渠道之一,该网站本身也已成为研究科学趋势的数据来源。

ArXiv has already taken steps to combat a growing number of low-quality, AI-generated papers, for example by requiring first-time posters to get an endorsement from an established author. And after being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, which should allow it to raise more money to address issues like AI slop.

ArXiv 此前已采取措施应对日益增多的低质量 AI 生成论文,例如要求首次发帖者必须获得资深作者的背书。在康奈尔大学托管 20 多年后,该组织正转型为独立的非营利机构,这将使其能够筹集更多资金来解决 AI 垃圾内容等问题。

In its latest move, Thomas Dietterich — the chair of arXiv’s computer science section — posted Thursday that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.” That incontrovertible evidence could include things like “hallucinated references” and comments to or from the LLM, Dietterich said.

在最新举措中,ArXiv 计算机科学部门主席 Thomas Dietterich 周四发文称:“如果提交的论文包含确凿证据,表明作者未核实大语言模型(LLM)生成的结果,这意味着我们无法信任论文中的任何内容。”Dietterich 表示,这些确凿证据可能包括“虚构的参考文献”以及与 LLM 之间的对话记录等。

If such evidence is found, a paper’s authors will face “a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.” Note that this isn’t an outright prohibition on using LLMs, but rather an insistence that, as Dietterich put it, authors take “full responsibility” for the content, “irrespective of how the contents are generated.”

如果发现此类证据,论文作者将面临“在 ArXiv 上被封禁 1 年的处罚,且此后提交的论文必须先被权威的同行评审机构录用”。需要注意的是,这并非完全禁止使用 LLM,而是正如 Dietterich 所言,坚持要求作者对内容承担“全部责任”,“无论内容是如何生成的”。

So if researchers copy-paste “inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content” directly from an LLM, then they’re still responsible for it. Dietterich told 404 Media that this will be a “one-strike” rule, but moderators must flag the issue and section chairs must confirm the evidence before imposing the penalty. Authors will also be able to appeal the decision.

因此,如果研究人员直接从 LLM 复制粘贴“不当语言、剽窃内容、偏见内容、错误、失误、不正确的参考文献或误导性内容”,他们仍需为此负责。Dietterich 告诉 404 Media,这将是一项“一次违规即受罚”的规则,但版主必须标记问题,且部门主席必须在实施处罚前确认证据。作者也有权对决定提出申诉。

Recent peer-reviewed research has found that fabricated citations are on the rise in biomedical research, likely due to LLMs — though to be fair, scientists aren’t the only ones getting caught using citations that were made up by AI.

最近的同行评审研究发现,生物医学研究中虚构引用的现象呈上升趋势,这很可能是由 LLM 导致的——不过平心而论,并非只有科学家会被发现使用由 AI 编造的引用。