Reimagining the mouse pointer for the AI era

为 AI 时代重构鼠标指针

May 12, 2026 | Adrien Baranes and Rob Marchant 2026 年 5 月 12 日 | Adrien Baranes 与 Rob Marchant

We are developing more seamless, intuitive ways to collaborate with AI. The mouse pointer has been a constant companion on computer screens, across every website, document and workflow. Despite how technologies have changed, the pointer has barely evolved in more than half a century. 我们正在开发更无缝、更直观的方式来与 AI 协作。在过去的半个多世纪里，无论技术如何更迭，鼠标指针始终是电脑屏幕上贯穿所有网站、文档和工作流程的“常驻伴侣”，却几乎没有发生过什么演变。

We’ve been exploring new AI-powered capabilities to help the pointer not only understand what it’s pointing at, but also why it matters to the user. 我们一直在探索由 AI 驱动的新功能，旨在帮助指针不仅能“识别”它所指向的内容，还能理解该内容对用户的重要性。

Our goal is to address a common frustration: because a typical AI tool lives in its own window, users need to drag their world into it. We want the opposite: intuitive AI that meets users across all the tools they use, without interrupting their flow. For example, imagine pointing to an image of a building, and requesting “Show me directions”. Nothing more is needed when the AI system already understands the context. 我们的目标是解决一个常见的痛点：由于典型的 AI 工具通常运行在独立的窗口中，用户不得不将自己的工作内容“拖入”其中。我们希望实现相反的效果：让直观的 AI 融入用户使用的所有工具中，且不会打断他们的工作流。例如，想象一下指向一张建筑物的图片并要求“给我导航”。当 AI 系统已经理解上下文时，无需再进行任何额外操作。

Today, we’re outlining the underlying principles guiding our thinking on future user interfaces, and sharing experimental demos of an AI-enabled pointer, powered by Gemini. For example, you could visit Google AI Studio to edit an image or find places on the map, just by pointing and speaking. 今天，我们概述了指导未来用户界面设计的底层原则，并分享了由 Gemini 驱动的 AI 指针实验性演示。例如，你可以访问 Google AI Studio，只需通过指点和语音，即可编辑图片或在地图上查找地点。

Our interaction principles

我们的交互原则

We’ve developed four principles that together shift the hard work of conveying context and intent from the user to the computer, replacing text-heavy prompts with simpler, more intuitive interactions. 我们制定了四项原则，旨在将传达上下文和意图的繁重工作从用户转移到计算机上，用更简单、更直观的交互取代繁琐的文本提示。

Maintain the flow 保持工作流

AI capabilities should work across all apps, not force users into “AI detours” between them. Our prototype AI-enabled pointer is available wherever the user is working. For example, they could point at a PDF and request a bullet-point summary to paste directly into an email, hover over a table of statistics and request a pie chart version, or highlight a recipe and ask for all the ingredients doubled. AI 功能应跨所有应用程序运行，而不是强迫用户在应用间进行“AI 绕路”。我们的 AI 指针原型可以在用户工作的任何地方使用。例如，用户可以指向一份 PDF 并要求生成要点摘要直接粘贴到邮件中，将鼠标悬停在统计表格上并要求生成饼图，或者高亮显示一份食谱并要求将所有配料加倍。

Show and tell 展示与说明

Current AI models demand precise instructions. To get a good response, a user has to write a detailed prompt. An AI-enabled pointer would streamline this process by smoothly capturing the visual and semantic context around the pointer, letting the computer “see” and understand what’s important to the user. 当前的 AI 模型需要精确的指令。为了获得理想的回复，用户必须编写详细的提示词。而 AI 指针通过平滑地捕捉指针周围的视觉和语义上下文来简化这一过程，让计算机能够“看到”并理解对用户而言重要的内容。

Embrace the power of “This” and “That” 拥抱“这个”与“那个”的力量

In everyday interactions, humans rarely speak in long, detailed paragraphs. We might say, “Fix this”, “Move that here”, or “What does this mean?” — while relying on physical gestures and shared context to fill in gaps. An AI system that understands this combination of context, pointing and speech would allow users to make complex requests in natural shorthand, no fiddly prompting required. 在日常交流中，人类很少使用冗长、详细的段落。我们可能会说“修好这个”、“把那个移到这里”或“这是什么意思？”——同时依靠肢体语言和共享的上下文来填补理解上的空白。一个能够理解这种上下文、指点动作和语音组合的 AI 系统，将允许用户以自然的简写方式提出复杂请求，无需繁琐的提示。

Turn pixels into actionable entities 将像素转化为可操作的实体

For decades, computers have only tracked where we are pointing. AI can now also understand what the user is pointing at. This transforms pixels into structured entities, such as places, dates, and objects, that users can interact with instantly. A photo of a scribbled note becomes an interactive to-do list; a paused frame in a travel video becomes a booking link for that cool-looking restaurant. 几十年来，计算机只能追踪我们指向的位置。而现在的 AI 还能理解用户指向的是什么。这会将像素转化为结构化的实体（如地点、日期和对象），用户可以立即与之交互。一张潦草笔记的照片可以变成交互式待办事项列表；旅游视频中暂停的一帧画面可以变成那家看起来很棒的餐厅的预订链接。

Building technology that adapts to human behavior — rather than forcing users to adapt to it — enables a future where collaborating with AI feels truly intuitive, fluid and seamless. We’re excited that these human-first concepts are being woven into products we use every day. 构建适应人类行为的技术——而不是强迫用户去适应技术——将开启一个与 AI 协作变得真正直观、流畅且无缝的未来。我们很高兴这些以人为本的理念正被融入我们日常使用的产品中。

Applying this work in our products

将此成果应用于我们的产品

We are now integrating these principles to reimagine pointing in Chrome and our new Googlebook laptop experience. Starting today, instead of writing a complex prompt, you can now use your pointer to ask Gemini in Chrome about the part of the webpage you care about. For example, you can select a few products on a page and ask to compare, or point to where you want to visualize a new couch in your living room. Similarly, we’ll soon roll out Magic Pointer in Googlebook, allowing users to harness… 我们目前正在整合这些原则，以重构 Chrome 浏览器和我们全新的 Googlebook 笔记本电脑体验中的指针功能。从今天开始，你无需编写复杂的提示词，只需使用指针即可向 Chrome 中的 Gemini 询问你关注的网页部分。例如，你可以选择页面上的几款产品并要求进行对比，或者指向你想要在客厅中可视化摆放新沙发的位置。同样，我们很快将在 Googlebook 中推出“魔法指针”（Magic Pointer），让用户能够利用……