How to Use Claude Code in Your Browser

How to Use Claude Code in Your Browser

如何在浏览器中使用 Claude Code

Agentic AI: How to Use Claude Code in Your Browser. Learn how to apply coding agents to verify work in your browser. Eivind Kjosbakken, Jun 22, 2026, 8 min read. 智能体 AI:如何在浏览器中使用 Claude Code。学习如何应用编程智能体来验证浏览器中的工作。Eivind Kjosbakken,2026 年 6 月 22 日,阅读时长 8 分钟。

Learn how to apply calling agents to navigate through your browser to verify work. Image by ChatGPT. 学习如何应用调用智能体来浏览网页以验证工作。图片由 ChatGPT 生成。

A common misconception about coding agents is that they can only be used to perform coding or programming. However, they are much more generalized agents and are capable of doing all office tasks essentially, though with varying degrees of success. 关于编程智能体的一个常见误区是,人们认为它们只能用于编写代码或进行编程。然而,它们实际上是更为通用的智能体,本质上能够完成所有的办公任务,尽管成功率各不相同。

One area, however, that has received a lot of attention is browsing using web browsers with coding agents such as Claude Code and OpenAI’s Codex. The agents have become incredibly proficient at navigating the web, which is super useful for a lot of different tasks. 不过,有一个领域受到了广泛关注,那就是利用 Claude Code 和 OpenAI 的 Codex 等编程智能体进行网页浏览。这些智能体在浏览网页方面已经变得非常熟练,这对许多不同的任务都非常有用。

Web browsing can, of course, be useful in many different situations, such as fetching information on the Internet or filling in forms for you. However, it’s worth noting that some of the use cases can break the terms of service, so you should definitely be aware of this. 当然,网页浏览在许多不同情况下都很有用,例如从互联网上获取信息或为您填写表格。但值得注意的是,某些使用场景可能会违反服务条款,因此您务必对此保持警惕。

The main usage area I’ll cover today is definitely fully legal, and it covers navigating applications you’re developing yourself with the coding agents to test and verify implementations. 我今天将介绍的主要使用领域绝对是完全合法的,它涵盖了使用编程智能体来导航您自己开发的应用程序,以测试和验证实现效果。

Previously, I’ve talked a lot about creating verifiable tasks whenever you ask coding agents to perform actions for you. Giving coding agents access to your browser to test implementations is a crucial part of this verifiability. This infographic highlights the main concept or topic of this article. 此前,我多次提到在要求编程智能体为您执行操作时,应创建可验证的任务。让编程智能体访问您的浏览器以测试实现情况,是实现这种可验证性的关键部分。这张信息图突出了本文的主要概念或主题。

I’ll discuss how to give your coding agent access to a browser to make it a lot more powerful. I’ll discuss why the coding agent needs access to a browser, the loop that you should set up, and how to use this browser access to make the agent verify its own work. Image by ChatGPT. 我将讨论如何让您的编程智能体访问浏览器,从而使其功能更加强大。我将探讨为什么编程智能体需要访问浏览器、您应该设置的循环机制,以及如何利用这种浏览器访问权限让智能体验证其自身的工作。图片由 ChatGPT 生成。

Why coding agents should use your browser

为什么编程智能体应该使用您的浏览器

First of all, I’d like to cover why you should care about running browsers with your coding agents. Browsers are an important interface humans use to interact with the world. Through your browser, you can perform a lot of different actions, such as reading up on information, filling in applications, and so on. 首先,我想谈谈为什么您应该关注让编程智能体运行浏览器。浏览器是人类与世界交互的重要界面。通过浏览器,您可以执行许多不同的操作,例如阅读信息、填写申请表等等。

Given that this is such an important interface for humans to interact with the world, a lot of attention and research has been targeted towards effectively navigating browsers. There are numerous companies out there that specialize in browser navigation, and also all the frontier labs offer such an integration into their products, such as OpenAI’s Codex and Anthropic’s Claude Code. 鉴于这是人类与世界交互的重要界面,大量的关注和研究都集中在如何有效地进行浏览器导航上。目前有许多专门从事浏览器导航的公司,所有前沿实验室也在其产品中提供了此类集成,例如 OpenAI 的 Codex 和 Anthropic 的 Claude Code。

Imagine if you’re telling a coding agent to implement a design following an HTML design file. The coding agent is, of course, good at front-end code and can start implementing it right away; however, if the coding agent can’t navigate the browser, it’s impossible for the coding agent to verify its own work. This vastly increases the chance that a coding agent will make errors and not implement the exact design that you wanted to implement. 想象一下,如果您告诉编程智能体根据 HTML 设计文件来实现一个设计。编程智能体当然擅长前端代码,可以立即开始实现;然而,如果编程智能体无法浏览网页,它就不可能验证自己的工作。这大大增加了编程智能体出错的可能性,导致无法实现您想要的确切设计。

Luckily, there is a very simple fix to this problem. Give your coding agent access to the browser. Allow it to take screenshots of the design it has implemented itself and compare it to the screenshots of the design you wanted it to implement. The coding agent can then continue iterating until the implemented code looks exactly like the design file. 幸运的是,这个问题有一个非常简单的解决方法。让您的编程智能体访问浏览器。允许它对自己实现的设计进行截图,并将其与您希望它实现的设计截图进行比较。然后,编程智能体可以继续迭代,直到实现的代码看起来与设计文件完全一致。

This saves you, as the programmer, a lot of time since you don’t have to repeatedly verify and instruct the coding agent on mistakes that it’s made when doing the design implementation. This again allows you to perform a lot of other different tasks and be more productive as an engineer. 这为您(程序员)节省了大量时间,因为您不必在它进行设计实现时反复验证并纠正其错误。这反过来又让您可以执行许多其他不同的任务,并作为工程师提高工作效率。

How it works

工作原理

Before moving on to how to navigate browsers with Claude Code, I also want to have a simple section covering how it works. In theory, it’s quite simple to navigate the browser. The coding agent navigates by opening up the browser, of course, where it has access to a few actions: Take screenshot, Click (coordinate-based), Enter text. 在继续介绍如何使用 Claude Code 浏览网页之前,我还想用一个简单的章节来介绍其工作原理。理论上,浏览网页非常简单。编程智能体通过打开浏览器进行导航,并拥有以下几种操作权限:截屏、点击(基于坐标)、输入文本。

These are the three main actions the coding agent performs, which are basically all the actions you need to interact with a browser: The coding agent needs to take screenshots because that’s how it finds out what is on each page and figures out where to click. The coding agent also needs to be able to click different places on the website, for example, click buttons or click input fields. This is coordinate-based. 这是编程智能体执行的三个主要操作,基本上涵盖了与浏览器交互所需的所有动作:编程智能体需要截屏,因为这是它了解页面内容并确定点击位置的方式。编程智能体还需要能够点击网站上的不同位置,例如点击按钮或输入框。这是基于坐标的。

So if the coding agent wants to click in a specific location, it outputs the following text: click(x=0.754, y=0.328). It basically uses the click function and gives the coordinates where it wants to click. The coordinates are typically normalized to be in a set range, such as between 0 and 1. Then, once the agent has clicked a specific location, it can input text to do everything it wants to do on the browser. 因此,如果编程智能体想要点击特定位置,它会输出以下文本:click(x=0.754, y=0.328)。它基本上是使用点击函数并给出想要点击的坐标。坐标通常被归一化到设定的范围内,例如 0 到 1 之间。然后,一旦智能体点击了特定位置,它就可以输入文本来执行它在浏览器中想要做的任何事情。

The coding agent can, of course, also perform different kinds of clicks, such as right-click to get more options on the page. This loop then iterates. The coding agent takes a screenshot, chooses which action to perform, checks if it has achieved its goal or not, and repeats. It takes a screenshot again, picks an action, checks if it achieved a goal, and continues. The agent simply continues like this until it has achieved its goal in the browser. 当然,编程智能体还可以执行不同类型的点击,例如右键点击以获取页面上的更多选项。这个循环会不断迭代。编程智能体截取屏幕,选择要执行的操作,检查是否达到了目标,然后重复此过程。它再次截屏、选择操作、检查是否达到目标,并继续下去。智能体就这样一直持续,直到在浏览器中实现其目标为止。

How to navigate browsers with Claude Code

如何使用 Claude Code 浏览网页

Next, I want to cover exactly how to navigate browsers using Claude Code, and the principles I’ll cover here basically apply to any coding agent. I’m not going to cover techniques that cannot easily be generalized to basically any other coding agent. 接下来,我想具体介绍如何使用 Claude Code 浏览网页,我在此处介绍的原则基本上适用于任何编程智能体。我不会介绍那些无法轻易推广到其他编程智能体的技术。

Firstly, if you’re using Claude Code, it has a built-in Chrome integration which you can simply enable by writing the command below while you’re in the Claude Code window: /chrome. Codex also has a corresponding command. This very simply gives Claude access to open Chrome on your computer and use it to verify tasks. I think the Chrome implementation in Claude works alright, but it’s not optimal. I have a better experience using the… 首先,如果您正在使用 Claude Code,它内置了 Chrome 集成功能,您只需在 Claude Code 窗口中输入以下命令即可启用:/chrome。Codex 也有相应的命令。这非常简单地让 Claude 有权在您的计算机上打开 Chrome 并使用它来验证任务。我认为 Claude 中的 Chrome 实现效果尚可,但并非最优。我在使用……时有更好的体验。