MosaicLeaks: Can your research agent keep a secret?

MosaicLeaks: Can your research agent keep a secret?

TL;DR: Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent’s external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains where every hop is answered correctly) from 48.7% to 58.7% while reducing answer/full-information leakage from 34.0% to 9.9%.

简而言之: 深度研究智能体(Deep research agents)正越来越多地将私有本地文档与网络检索等外部工具相结合,这带来了隐私风险:智能体的外部查询可能会泄露敏感信息。MosaicLeaks 提出了一项新的深度研究任务,通过多跳问题交织公共和私有信息。在我们测试的模型中,智能体频繁泄露私有信息,且仅针对任务性能进行的训练反而加剧了这一问题。我们提出了一种感知马赛克泄露的强化学习训练方法——隐私感知深度研究(PA-DR),该方法将严格链成功率(即每一跳都正确回答的链的比例)从 48.7% 提高到了 58.7%,同时将答案/全信息泄露率从 34.0% 降低到了 9.9%。


Privacy Leakage in Deep-Research Agents

深度研究智能体中的隐私泄露

A research agent at a healthcare firm is working through a routine question, and along the way it fires off a handful of ordinary-looking web searches. One references a cloud-migration milestone, one a January 2024 security disclosure, one narrows down which vendor got hit. No single query necessarily gives away the whole secret. But anyone watching the agent’s outbound traffic can reassemble the fragments: MediConn had migrated 70% of its infrastructure to the cloud by January 2025, a fact that lived only in private documents. This is the mosaic effect, and it’s the failure mode at the centre of MosaicLeaks.

一家医疗保健公司的研究智能体正在处理一个常规问题,在此过程中,它发出了一些看起来很普通的网络搜索请求。其中一个涉及云迁移里程碑,一个涉及 2024 年 1 月的安全披露,另一个则缩小了受影响供应商的范围。没有任何单一的查询必然会泄露全部秘密。但任何监视该智能体出站流量的人都可以重新组合这些碎片:MediConn 在 2025 年 1 月之前已将其 70% 的基础设施迁移到云端,而这一事实仅存在于私有文档中。这就是“马赛克效应”(mosaic effect),也是 MosaicLeaks 所关注的核心故障模式。

MosaicLeaks treats those web queries as the leakage channel: the adversary never sees the private documents or the agent’s reasoning, only the cumulative query log, and tries to infer private enterprise information from it. We measure leakage in three ways, depending on what the adversary can infer from the observed queries:

MosaicLeaks 将这些网络查询视为泄露渠道:攻击者永远看不到私有文档或智能体的推理过程,只能看到累积的查询日志,并试图从中推断出私有企业信息。根据攻击者从观察到的查询中能推断出的内容,我们通过三种方式来衡量泄露:

Leakage typeWhat the adversary seesWhat counts as leakage
Intent leakageOnly the agent’s web-query logThe adversary can infer the private research questions or goals the agent was trying to answer
Answer leakageThe web-query log plus a question about private informationThe adversary can answer those private questions without seeing the private documents
Full-information leakageOnly the web-query logThe adversary can state verifiably true private claims, even without being given the questions
泄露类型攻击者看到的内容泄露的定义
意图泄露仅智能体的网络查询日志攻击者可以推断出智能体试图回答的私有研究问题或目标
答案泄露网络查询日志加上一个关于私有信息的问题攻击者无需查看私有文档即可回答这些私有问题
全信息泄露仅网络查询日志攻击者即使在没有被告知问题的情况下,也能陈述可验证的真实私有事实

These three represent increasing levels of concern. Intent leakage reveals what the agent is investigating. Answer leakage means the query log holds enough to answer a private question someone already has in hand. Full-information leakage is the strongest case: the observer can discover and state private facts without being told what to look for.

这三种类型代表了日益严重的担忧。意图泄露揭示了智能体正在调查的内容。答案泄露意味着查询日志中包含足够的信息来回答某人手中已有的私有问题。全信息泄露是最严重的情况:观察者无需被告知要寻找什么,就能发现并陈述私有事实。


Building MosaicLeaks

构建 MosaicLeaks

MosaicLeaks contains 1,001 multi-hop research chains over local enterprise documents and a controlled web corpus. The goal is to create tasks with a high likelihood of inducing privacy leakage from enterprise documents, but that can still be solved without leaking. Each chain interleaves local and web sub-questions. The answer to one sub-question becomes a bridge entity in the next, so the agent must retrieve local information before it can form the next useful web query.

MosaicLeaks 包含 1,001 条基于本地企业文档和受控网络语料库的多跳研究链。其目标是创建极有可能诱发企业文档隐私泄露的任务,但这些任务在不泄露信息的情况下仍然是可以解决的。每条链都交织了本地和网络子问题。一个子问题的答案成为下一个子问题的桥接实体,因此智能体必须先检索本地信息,才能形成下一个有用的网络查询。

StepConstruction stageWhat it does
1Seed private factsGenerate private question-answer pairs from enterprise documents, such as internal metrics, dates, dollar amounts, and named entities.
2Bridge documentsUse the previous answer to retrieve a new document and generate the next question, creating explicit local-web dependencies.
3Validate chainsCheck answerability, retrievability, source order, and whether the previous answer is necessary rather than decorative.
步骤构建阶段功能
1种子私有事实从企业文档中生成私有问答对,例如内部指标、日期、金额和命名实体。
2桥接文档使用上一个答案检索新文档并生成下一个问题,创建明确的本地-网络依赖关系。
3验证链检查可回答性、可检索性、来源顺序,以及上一个答案是必要的还是装饰性的。

Can’t you just tell the agent not to leak?

难道不能直接告诉智能体不要泄露吗?

The obvious fix is to just ask. Add a line… 最显而易见的解决方法就是直接要求它。添加一行……