bytedance / UI-TARS-desktop

bytedance / UI-TARS-desktop

TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop. TARS* 是一个多模态 AI Agent 技术栈,目前包含两个项目:Agent TARS 和 UI-TARS-desktop。

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools. Agent TARS 是一个通用的多模态 AI Agent 技术栈,它将 GUI Agent 和视觉能力引入到你的终端、计算机、浏览器和产品中。它主要通过 CLI 和 Web UI 提供使用。其目标是通过前沿的多模态大模型以及与各种现实世界 MCP 工具的无缝集成,提供一种更接近人类任务完成方式的工作流。

UI-TARS Desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model. It primarily ships a local and remote computer as well as browser operators. UI-TARS Desktop 是一款桌面应用程序,提供基于 UI-TARS 模型的原生 GUI Agent。它主要提供本地和远程计算机以及浏览器的操作功能。


News

[2025-11-05] 🎉 We’re excited to announce the release of Agent TARS CLI v0.3.0! This version brings streaming support for multiple tools (shell commands, multi-file structured display), runtime settings with timing statistics for tool calls and deep thinking, Event Stream Viewer for data flow tracking and debugging. Additionally, it features exclusive support for AIO agent Sandbox as isolated all-in-one tools execution environment. [2025-11-05] 🎉 我们很高兴宣布发布 Agent TARS CLI v0.3.0!此版本带来了对多种工具的流式支持(Shell 命令、多文件结构化显示)、包含工具调用和深度思考计时统计的运行时设置,以及用于数据流跟踪和调试的事件流查看器(Event Stream Viewer)。此外,它还独家支持 AIO Agent 沙箱,作为隔离的一体化工具执行环境。

[2025-06-25] We released an Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools. [2025-06-25] 我们发布了 Agent TARS Beta 和 Agent TARS CLI —— 介绍 Agent TARS Beta,这是一个多模态 AI Agent,旨在通过丰富的多模态能力(如 GUI Agent、视觉)以及与各种现实世界工具的无缝集成,探索一种更接近人类任务完成方式的工作形态。

[2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence. [2025-06-12] - 🎁 我们很高兴宣布发布 UI-TARS Desktop v0.2.0!此次更新引入了两个强大的新功能:远程计算机操作员(Remote Computer Operator)和远程浏览器操作员(Remote Browser Operator)——两者均完全免费。无需配置:只需点击即可远程控制任何计算机或浏览器,体验全新的便捷与智能。

[2025-04-17] - 🎉 We’re thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control. [2025-04-17] - 🎉 我们很高兴宣布发布全新的 UI-TARS Desktop 应用程序 v0.1.0,其采用了重新设计的 Agent UI。该应用程序增强了计算机使用体验,引入了新的浏览器操作功能,并支持先进的 UI-TARS-1.5 模型,以实现更好的性能和精确控制。

[2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents. [2025-02-20] - 📦 推出了 UI TARS SDK,这是一个用于构建 GUI 自动化 Agent 的强大跨平台工具包。

[2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment. [2025-01-23] - 🚀 我们更新了中文版《GUI模型部署教程》中的云部署部分,增加了与 ModelScope 平台相关的新信息。现在,你可以使用 ModelScope 平台进行部署。


Agent TARS

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools. Agent TARS 是一个通用的多模态 AI Agent 技术栈,它将 GUI Agent 和视觉能力引入到你的终端、计算机、浏览器和产品中。它主要通过 CLI 和 Web UI 提供使用。其目标是通过前沿的多模态大模型以及与各种现实世界 MCP 工具的无缝集成,提供一种更接近人类任务完成方式的工作流。

Core Features

  • 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server execution.
  • 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
  • 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
  • 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

核心功能

  • 🖱️ 一键开箱即用的 CLI - 支持有头 Web UI 和无头服务器执行。
  • 🌐 混合浏览器 Agent - 使用 GUI Agent、DOM 或混合策略控制浏览器。
  • 🔄 事件流 (Event Stream) - 协议驱动的事件流驱动上下文工程和 Agent UI。
  • 🧰 MCP 集成 - 内核基于 MCP 构建,并支持挂载 MCP 服务器以连接现实世界的工具。

UI-TARS Desktop

UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models. UI-TARS Desktop 是一个用于本地计算机的原生 GUI Agent,由 UI-TARS 和 Seed-1.5-VL/1.6 系列模型驱动。

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS/Browser)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing

功能

  • 🤖 由视觉语言模型驱动的自然语言控制
  • 🖥️ 支持屏幕截图和视觉识别
  • 🎯 精确的鼠标和键盘控制
  • 💻 跨平台支持 (Windows/MacOS/浏览器)
  • 🔄 实时反馈和状态显示
  • 🔐 私密且安全 - 完全本地化处理