alibaba / page-agent
Alibaba / Page-Agent
Page Agent: The GUI Agent Living in Your Webpage. Control web interfaces with natural language. Page Agent:驻留在网页中的 GUI Agent。通过自然语言控制网页界面。
🌐 English | 中文 🚀 Demo | 📖 Docs | 📢 HN Discussion | 𝕏 Follow on X
✨ Features / 功能特性
🎯 Easy integration: No need for browser extension / python / headless browser. Just in-page javascript. Everything happens in your web page. 🎯 轻松集成:无需浏览器扩展、Python 或无头浏览器。只需在页面内嵌入 JavaScript,一切操作均在网页内完成。
📖 Text-based DOM manipulation: No screenshots. No multi-modal LLMs or special permissions needed. 📖 基于文本的 DOM 操作:无需截图,无需多模态大模型,也无需特殊权限。
🧠 Bring your own LLMs: 🐙 Optional chrome extension for multi-page tasks. And an MCP Server (Beta) to control it from outside. 🧠 自带大模型:🐙 提供可选的 Chrome 扩展以支持多页面任务,并提供 MCP Server(测试版)以实现外部控制。
💡 Use Cases / 应用场景
SaaS AI Copilot — Ship an AI copilot in your product in lines of code. No backend rewrite. SaaS AI Copilot — 只需几行代码即可为你的产品添加 AI 助手,无需重写后端。
Smart Form Filling — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems. 智能表单填写 — 将 20 次点击的工作流简化为一句话。非常适合 ERP、CRM 和管理系统。
Accessibility — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier. 无障碍访问 — 通过自然语言使任何 Web 应用具备无障碍功能。支持语音指令、屏幕阅读器,实现零门槛访问。
Multi-page Agent — Extend your own web agent’s reach across browser tabs (chrome extension). 多页面 Agent — 通过 Chrome 扩展将你的 Web Agent 的能力扩展到多个浏览器标签页。
MCP - Allow your agent clients to control your browser. MCP - 允许你的 Agent 客户端控制你的浏览器。
🚀 Quick Start / 快速开始
One-line integration: Fastest way to try PageAgent with our free Demo LLM: 一行代码集成:使用我们免费的演示大模型体验 PageAgent 的最快方式:
<script src="{URL}" crossorigin="true"></script>
⚠️ For technical evaluation only. This demo CDN uses our free testing LLM API. By using it, you agree to its terms. ⚠️ 仅供技术评估使用。此演示 CDN 使用我们免费的测试大模型 API。使用即代表你同意相关条款。
Mirrors URL: 镜像地址:
- Global:
https://cdn.jsdelivr.net/npm/page-agent@1.11.0/dist/iife/page-agent.demo.js - China:
https://registry.npmmirror.com/page-agent/1.11.0/files/dist/iife/page-agent.demo.js
Add ?autoInit=false to load the script without creating the demo agent automatically. You can then instantiate it with new window.PageAgent(...).
添加 ?autoInit=false 可在加载脚本时不自动创建演示 Agent。之后你可以通过 new window.PageAgent(...) 进行实例化。
NPM Installation: NPM 安装:
npm install page-agent
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
For more programmatic usage, see 📖 Documentations. 更多编程用法,请参阅 📖 文档。
🌟 Awesome Page Agent
Built something cool with PageAgent? Add it here! Open a PR to share your project. These are community projects — not maintained or endorsed by us. Use at your own discretion. 用 PageAgent 做出了很酷的东西?欢迎在此添加!提交 PR 分享你的项目。这些是社区项目,不由我们维护或背书,请自行斟酌使用。
🤝 Contributing / 贡献指南
We welcome contributions from the community! See CONTRIBUTING.md for guidelines and docs/developer-guide.md for local development workflows. Please read the maintainer’s note on principles and current state. Contributions generated entirely by bots or AI without substantial human involvement will not be accepted. 我们欢迎社区贡献!请参阅 CONTRIBUTING.md 获取指南,参阅 docs/developer-guide.md 获取本地开发流程。请阅读维护者关于原则和当前状态的说明。完全由机器人或 AI 生成且缺乏实质性人工参与的贡献将不被接受。
⚖️ License / 许可证
MIT License
👏 Acknowledgments / 致谢
This project builds upon the excellent work of browser-use. PageAgent is designed for client-side web enhancement, not server-side automation. DOM processing components and prompt are derived from browser-use. 本项目基于 browser-use 的出色工作。PageAgent 旨在进行客户端 Web 增强,而非服务器端自动化。DOM 处理组件和提示词均源自 browser-use。
We gratefully acknowledge the browser-use project and its contributors for their excellent work on web automation and DOM interaction patterns that helped make this project possible. 我们衷心感谢 browser-use 项目及其贡献者在 Web 自动化和 DOM 交互模式方面所做的卓越工作,这些工作使本项目成为可能。
⭐ Star this repo if you find PageAgent helpful! ⭐ 如果你觉得 PageAgent 有帮助,请给本项目点个 Star!