EY Canada published a cybersecurity report and most citations were hallucinated

EY Canada published a cybersecurity report and most citations were hallucinated

安永加拿大发布了一份网络安全报告,其中大部分引文均为“幻觉”产物

Earlier this year, an engineer at GPTZero coined the term “vibe citing” to describe the accidental creation of fake references via LLM hallucinations. It turns out that the friction of creating and checking citations is leading many researchers, consultants, lawyers, and public officials to embrace the vibe (if you know what we mean). 今年早些时候,GPTZero 的一名工程师创造了“氛围引用”(vibe citing)一词,用以描述通过大语言模型(LLM)幻觉意外生成虚假参考文献的现象。事实证明,创建和核对引文过程中的繁琐,正导致许多研究人员、顾问、律师和公职人员开始“随心所欲”(你懂的)。

Among the converts are the authors of a 2025 Ernst & Young report titled Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems. This report, stuffed with fake citations and inaccurate claims, is surfacing in newspapers, blog posts, and AI search overviews, poisoning the data that both human researchers and AI agents rely on. 安永(Ernst & Young)2025 年一份题为《攻击点:揭露忠诚度系统中的网络威胁与欺诈》(Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems)的报告作者们,也成为了这种“氛围引用”的拥趸。这份充斥着虚假引文和不实主张的报告,正出现在报纸、博客文章和人工智能搜索摘要中,污染着人类研究人员和 AI 代理所依赖的数据源。

GPTZero began targeting vibe citations with our Hallucination Check tool in 2025, which we used to further investigations into a government publication, two different Deloitte reports, and prestigious machine learning / artificial intelligence conferences like NeurIPS and ICLR. Over the past few months we’ve set up an automated pipeline to search for vibe citations by finding and scanning public reports from major consulting firms. What we’ve found suggests that the vibe citing epidemic is already endemic, even among the major players. GPTZero 于 2025 年开始利用我们的“幻觉检测”(Hallucination Check)工具针对“氛围引用”进行调查。我们利用该工具深入研究了一份政府出版物、两份不同的德勤(Deloitte)报告,以及 NeurIPS 和 ICLR 等著名的机器学习/人工智能会议。在过去的几个月里,我们建立了一个自动化流程,通过查找和扫描各大咨询公司的公开报告来搜寻“氛围引用”。我们的发现表明,“氛围引用”的流行病已经蔓延,即使在行业巨头中也已根深蒂固。

Instead of releasing our results all at once, we’re going to focus on one report at a time. This approach both prevents individual examples being overlooked and allows us to illustrate the negative impacts of vibe citing on research quality and public trust. 我们不打算一次性发布所有结果,而是决定逐一分析每份报告。这种方法既能防止遗漏个别案例,也能让我们更清晰地展示“氛围引用”对研究质量和公众信任造成的负面影响。

On the menu: Ernst & Young (EY)

本期焦点:安永(EY)

Ernst & Young is one of the “big four” global consulting firms, providing accounting and consulting services to governments and private entities from 150 offices around the world. The Canadian member firm (EY Canada) provides millions of dollars of services to the Canadian government annually. 安永是全球“四大”咨询公司之一,通过遍布全球的 150 个办事处为政府和私营实体提供会计与咨询服务。其加拿大成员公司(安永加拿大)每年为加拿大政府提供价值数百万美元的服务。

In late 2025, EY Canada published a 44-page report on cyber security titled Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems. While credited to three employees (two partners and one senior manager), the document is a collage of vibe citations, misattributions, fake statistics, and AI-written text. 2025 年底,安永加拿大发布了一份 44 页的网络安全报告,题为《攻击点:揭露忠诚度系统中的网络威胁与欺诈》。尽管该报告署名为三名员工(两名合伙人和一名高级经理),但整份文档实际上是由“氛围引用”、错误归因、虚假统计数据和 AI 生成文本拼凑而成的。

Why the Vibes Are Bad

为什么“氛围”很糟糕

EY Canada’s report doesn’t use footnotes or normal academic citations. Instead, it references sources directly in the text and/or includes them in a resources table (p. 41-43). This table provides a source title, description, and URL for all sources, as well as the publisher and date in certain cases. Almost all of the URLs are broken or fake, and more than half of the titles don’t correspond to real sources. 安永加拿大的这份报告没有使用脚注或标准的学术引用格式。相反,它直接在正文中引用来源,或将其列在资源表格中(第 41-43 页)。该表格提供了所有来源的标题、描述和 URL,在某些情况下还包括发布者和日期。几乎所有的 URL 都已失效或为虚构,且超过一半的标题无法对应到真实的来源。

GPTZero uses a very specific definition of vibe citation because of the potential reputational cost (to both us and the report’s authors) of false positives. One of our team members manually verified Hallucination Check’s results to ensure their accuracy. 由于误报可能带来的声誉风险(无论是对我们还是对报告作者),GPTZero 对“氛围引用”有着非常明确的定义。我们的一名团队成员对“幻觉检测”的结果进行了人工核实,以确保其准确性。

(Note: The original article includes a table of specific hallucinated citations, such as broken links to BleepingComputer, Wired, Gartner, Forbes, McKinsey, Cisco Talos, and TechCrunch, all of which were verified as non-existent or incorrect.) (注:原文包含一份具体的幻觉引文列表,例如指向 BleepingComputer、Wired、Gartner、Forbes、McKinsey、Cisco Talos 和 TechCrunch 的失效链接,所有这些链接均已被核实为不存在或错误。)

During our previous analysis of academic conference submissions, we found that many authors primarily used AI to generate and format their references, resulting in papers with vibed citations but low AI text scores overall. 在我们之前对学术会议投稿的分析中,我们发现许多作者主要使用 AI 来生成和格式化参考文献,这导致论文中出现了“氛围引用”,但整体 AI 生成文本的得分却较低。

However, it’s hard to find human fingerprints in Points of Attack — harder, even, than finding a human-written LinkedIn post. Not only does the text scan as AI-generated, it’s riddled with common LLM errors like fake statistics, misattributions, and internal contradictions. 然而,在《攻击点》这份报告中,几乎找不到人类撰写的痕迹——甚至比在 LinkedIn 上找一篇人类写的帖子还要难。这份文本不仅被检测为 AI 生成,而且充斥着大语言模型常见的错误,如虚假统计数据、错误归因和内部逻辑矛盾。