Gemini API File Search is now multimodal
Gemini API File Search is now multimodal: build efficient, verifiable RAG
Gemini API 文件搜索现已支持多模态:构建高效、可验证的 RAG
We’re introducing three major updates to the Gemini API File Search tool: multimodal support, custom metadata and page-level citations. These features help developers bring structure to unstructured data for efficient, verifiable RAG.
我们为 Gemini API 文件搜索(File Search)工具引入了三项重大更新:多模态支持、自定义元数据以及页面级引用。这些功能旨在帮助开发者为非结构化数据构建结构,从而实现高效且可验证的检索增强生成(RAG)。
Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata. We’re also introducing page citations to improve grounding and transparency. Whether you are prototyping a weekend project or scaling a production application for thousands of users, your RAG systems can now natively process and better organize your text and visual data.
今天,我们扩展了 Gemini API 的文件搜索工具。现在,你可以利用多模态数据和自定义元数据来构建检索增强生成(RAG)系统。我们还引入了页面引用功能,以提高回答的准确性和透明度。无论你是进行周末项目的原型设计,还是为数千名用户扩展生产级应用,你的 RAG 系统现在都能原生处理并更好地组织文本和视觉数据。
Give your apps a photographic memory 赋予你的应用“照相式记忆”
File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data, providing your agents contextual awareness. Think of a creative agency trying to dig up a specific visual asset. Instead of relying on keywords or filenames, your app can search an entire archive for an image matching a specific emotional tone or visual style described in a natural language brief.
文件搜索现在可以同时处理图像和文本。在 Gemini Embedding 2 模型的支持下,该工具能够理解原生图像数据,从而为你的智能体提供上下文感知能力。想象一家创意机构试图挖掘特定的视觉资产:你的应用无需依赖关键词或文件名,即可在整个档案库中搜索出与自然语言简报中描述的情感基调或视觉风格相匹配的图像。
Filter the noise with custom metadata 利用自定义元数据过滤干扰信息
Dumping files into a database is easy. Finding the right one at scale is the real challenge. Custom metadata allows you to attach key-value labels to your unstructured data — things like department: Legal or status: Final. By applying metadata filters at query time, your application can scope requests to the data slice required. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of your RAG workflows.
将文件存入数据库很容易,但在大规模数据中找到正确的文件才是真正的挑战。自定义元数据允许你为非结构化数据附加键值标签,例如 department: Legal(部门:法务)或 status: Final(状态:最终版)。通过在查询时应用元数据过滤器,你的应用可以将请求范围限定在所需的数据切片内。这显著减少了来自无关文档的干扰,从而提高了 RAG 工作流的速度和准确性。
Show your work with page citations 通过页面引用展示依据
When your application pulls an answer from a massive PDF, users need to verify exactly where that answer came from. File Search now ties the model’s response directly to the original source. It captures the page number for every piece of indexed information. This level of granularity allows you to point users directly to the right spot, which helps build trust and makes your tool immediately useful for rigorous fact-checking.
当你的应用从海量 PDF 文档中提取答案时,用户需要核实该答案的确切来源。文件搜索现在将模型的响应直接与原始来源挂钩,并捕获每一条索引信息的页码。这种细粒度功能允许你直接引导用户跳转到正确位置,这有助于建立信任,并使你的工具在严谨的事实核查中即刻发挥作用。
Get started with File Search 开始使用文件搜索
We want to make it as easy as possible to store and retrieve the data that makes your ideas work. The File Search tool handles the heavy infrastructure so you can focus on building the product. Uploading files and searching across them is simple: Explore more code snippets in our developer guide and Gemini API documentation to learn how to build with File Search.
我们希望尽可能简化存储和检索数据的过程,让你的创意得以实现。文件搜索工具处理了繁重的底层基础设施工作,让你能够专注于产品构建。上传文件并进行跨文件搜索非常简单:请查看我们的开发者指南和 Gemini API 文档中的更多代码片段,了解如何使用文件搜索进行开发。