AI chatbots are giving out people’s real phone numbers

AI chatbots are giving out people’s real phone numbers

AI 聊天机器人正在泄露人们的真实电话号码

A Redditor recently wrote that he was “desperate for help”: for about a month, he said, his phone had been inundated by calls from “strangers” who were “looking for a lawyer, a product designer, a locksmith.” Callers were apparently misdirected by Google’s generative AI. 一位 Reddit 用户最近发帖称他“急需帮助”:他说,大约一个月以来,他的手机一直被“陌生人”打来的电话淹没,这些人都在“寻找律师、产品设计师或锁匠”。显然,这些来电者是被谷歌的生成式 AI 误导了。

In March, a software developer in Israel was contacted on WhatsApp after Google’s chatbot Gemini provided incorrect customer service instructions that included his number. And in April, a PhD candidate at the University of Washington was messing around on Gemini and got it to cough up her colleague’s personal cell phone number. 今年 3 月,以色列的一名软件开发人员在 WhatsApp 上收到陌生人消息,原因是谷歌的聊天机器人 Gemini 提供了错误的客户服务说明,其中包含了他的电话号码。4 月,华盛顿大学的一名博士生在试用 Gemini 时,成功让它吐露了她同事的个人手机号码。

AI researchers and online privacy experts have long warned of the myriad dangers generative AI poses for personal privacy. These cases give us yet another scenario to worry about: generative AI exposing people’s real phone numbers. (The Redditor did not respond to multiple requests for comment and we could not independently verify his story.) AI 研究人员和在线隐私专家长期以来一直警告生成式 AI 对个人隐私构成的诸多危险。这些案例为我们提供了另一个令人担忧的情景:生成式 AI 正在泄露人们的真实电话号码。(该 Reddit 用户未回应多次置评请求,我们也无法独立核实其说法。)

Experts say that these privacy lapses are most likely due to personally identifiable information (PII) being used in training data, though it’s hard to understand the exact mechanism causing real phone numbers to show up in the AI-generated responses. But no matter the reason, the result is not fun for people on the receiving end—and, even more worryingly, there appears to be little that anyone can do to stop it. 专家表示,这些隐私漏洞很可能是由于训练数据中使用了个人身份信息(PII)所致,尽管目前很难理解导致真实电话号码出现在 AI 生成回复中的确切机制。但无论原因如何,对于受影响的人来说,结果并不乐观——更令人担忧的是,目前似乎没有什么办法能阻止这种情况发生。

AI 相关隐私请求激增 400%

It’s impossible to know how often people’s phone numbers are exposed by AI chatbots, but experts say they believe that it is happening far more than is reported publicly. DeleteMe, a company that helps customers remove their personal information from the internet, says customer queries about generative AI have increased by 400%—up to a few thousand—in the last seven months. 目前无法得知 AI 聊天机器人泄露个人电话号码的频率,但专家认为,这种情况的发生次数远超公开报道的数量。帮助客户从互联网上删除个人信息的公司 DeleteMe 表示,在过去七个月里,有关生成式 AI 的客户咨询量增加了 400%,达到数千次。

These queries “specifically reference ChatGPT, Claude, Gemini … or other generative AI tools,” says Rob Shavell, the company’s cofounder and CEO. Specifically, 55% of these concerns about generative AI reference ChatGPT, 20% reference Gemini, 15% Claude, and 10% other AI tools, Shavell says. (MIT Technology Review has a business subscription to DeleteMe.) 该公司联合创始人兼首席执行官 Rob Shavell 表示,这些咨询“明确提到了 ChatGPT、Claude、Gemini……或其他生成式 AI 工具”。Shavell 指出,具体而言,55% 关于生成式 AI 的担忧涉及 ChatGPT,20% 涉及 Gemini,15% 涉及 Claude,10% 涉及其他 AI 工具。(《麻省理工科技评论》订阅了 DeleteMe 的企业服务。)

Shavell says customer complaints about personal information being surfaced by LLMs usually take two forms: Either “a customer asks a chatbot something innocuous about themselves and gets back accurate home addresses, phone numbers, family members’ names, or employer details.” Alternatively, a customer may be confronted with and report the exposure of someone else’s personal data, when “the chatbot generates plausible-but-wrong contact information.” Shavell 表示,客户关于大语言模型(LLM)泄露个人信息的投诉通常有两种形式:要么是“客户询问聊天机器人关于自己的无害信息,却得到了准确的家庭住址、电话号码、家庭成员姓名或雇主详细信息”;要么是客户发现并举报了他人个人数据的泄露,即“聊天机器人生成了看似合理但错误的联系方式”。

This aligns with what happened to Daniel Abraham, a 28-year-old software engineer in Israel. In mid-March, he says, a stranger sent him a “weird WhatsApp message from an unknown number” asking for help with his account in PayBox, an Israeli payment app. “I thought it was a spam message,” he wrote to MIT Technology Review in an email—“someone who was trying to troll me.” 这与以色列 28 岁的软件工程师 Daniel Abraham 的遭遇不谋而合。他说,3 月中旬,一个陌生人给他发了一条“来自未知号码的奇怪 WhatsApp 消息”,请求他协助处理以色列支付应用 PayBox 的账户问题。“我以为是垃圾信息,”他在给《麻省理工科技评论》的电子邮件中写道,“有人想捉弄我。”

But when he asked the stranger how they had found his number, they sent him a screenshot of Gemini’s instructions to contact PayBox customer service via WhatsApp—giving his personal number. Abraham does not work for PayBox, and PayBox does not have a WhatsApp customer service number, Elad Gabay, a customer service representative for the company, confirmed. 但当他询问对方是如何找到他的号码时,对方发来了一张 Gemini 的截图,上面显示通过 WhatsApp 联系 PayBox 客服的说明——其中提供的正是他的个人号码。PayBox 的客服代表 Elad Gabay 证实,Abraham 并不在 PayBox 工作,且 PayBox 也没有 WhatsApp 客服号码。

Later, Abraham asked Gemini how to contact PayBox, and it generated another person’s WhatsApp number. When I recently asked, Gemini again responded with an Israeli phone number—it belonged not to PayBox, but to a separate credit card company that works with PayBox. 后来,Abraham 询问 Gemini 如何联系 PayBox,它又生成了另一个人的 WhatsApp 号码。当我最近进行测试时,Gemini 再次给出了一个以色列电话号码——它不属于 PayBox,而是属于一家与 PayBox 有合作关系的独立信用卡公司。

Abraham’s exchange with the stranger ended quickly, but he said he was concerned about how other potential exchanges could quickly turn sour, including “harassment or other bad interactions.” “What if I asked for money in order to ‘solve’ that [customer service] issue?” he said. Abraham 与陌生人的交流很快结束了,但他表示担心其他潜在的交流可能会迅速恶化,包括“骚扰或其他不良互动”。“如果我要求对方付钱来‘解决’那个(客服)问题怎么办?”他说。

To try to figure out how this happened, Abraham ran a regular Google search on his phone number, and he found that it had been shared online once, back in 2015, on a local site similar to Quora. Though he’s not sure who posted it there, it may explain how it ended up being reproduced by Gemini over a decade later. 为了弄清楚这是怎么回事,Abraham 用谷歌搜索了自己的电话号码,发现它曾在 2015 年被发布在一个类似于 Quora 的本地网站上。虽然他不确定是谁发布的,但这或许解释了为什么它会在十多年后被 Gemini 重新生成出来。

Chatbots like Gemini, Open AI’s ChatGPT, and Anthropic’s Claude are built on LLMs that are trained on huge amounts of data scraped from across the web. This inevitably includes hundreds of millions of instances of PII. As we reported last summer, for example, the large popular open-source data set DataComp CommonPool, which has been used to train image-generation models, included copies of résumés, driver’s licenses, and credit cards. 像 Gemini、OpenAI 的 ChatGPT 和 Anthropic 的 Claude 这样的聊天机器人,都是基于从网络上抓取的海量数据训练而成的大语言模型。这不可避免地包含了数以亿计的个人身份信息(PII)。例如,正如我们去年夏天报道的那样,用于训练图像生成模型的大型流行开源数据集 DataComp CommonPool,就包含了简历、驾照和信用卡的副本。

The likelihood of PII appearing in AI training data is only increasing as public data “runs out” and AI companies look for new sources of high-quality training data. This includes information from data brokers and people-search websites. According to the California data broker registry, for instance, 31 of 578 registered data brokers operating in the state self-reported that they had “shared or sold consumers’ data to a developer of a GenAI system or model in the past year.” 随着公共数据“耗尽”,AI 公司开始寻找新的高质量训练数据来源,个人身份信息出现在 AI 训练数据中的可能性只会越来越大。这包括来自数据经纪人和个人搜索网站的信息。例如,根据加州数据经纪人登记处的数据,在该州运营的 578 家注册数据经纪人中,有 31 家自称在过去一年中“向生成式 AI 系统或模型的开发人员共享或出售了消费者数据”。

Furthermore, models are known to memorize and reproduce data verbatim from training data sets—and recent research suggests that it is not just frequently appearing data that is most likely to be memorized. 此外,众所周知,模型会记忆并逐字复现训练数据集中的数据——而最近的研究表明,最容易被记住的不仅仅是那些频繁出现的数据。

Imperfect measures

不完善的措施

It’s standard practice now to build guardrails into an LLM’s design to constrain certain outputs, ranging from content filters meant to identify and prevent chatbots from releasing PII to Anthropic’s instructions to Claude to choose responses that contain “the least personal, private, or confidential information belonging to others.” But as a pair of University of Washington PhD students researching privacy and technology saw firsthand recently, these safeguards don’t always work. 现在,在大语言模型的设计中内置护栏以限制某些输出已成为标准做法,从旨在识别并防止聊天机器人泄露个人身份信息的内容过滤器,到 Anthropic 指示 Claude 选择包含“最少他人个人、私密或机密信息”的回复。但正如两名研究隐私与技术的华盛顿大学博士生最近亲眼所见,这些保障措施并不总是有效。

“One day, I was just playing around on Gemini, and I searched for Yael Eiger, my f “有一天,我只是在 Gemini 上随便玩玩,搜索了 Yael Eiger,我的……”