Holo3.1: Fast & Local Computer Use Agents

Holo3.1：快速且本地化的计算机操作智能体

Last March, we released Holo3, our state-of-the-art computer-use model. Adoption was immediate. Developers, enterprises, and partners started deploying Holo3 across a wide range of workflows, from browser automation and business software to internal tools and desktop applications. 去年三月，我们发布了最先进的计算机操作模型 Holo3。该模型随即获得了广泛采用。开发者、企业和合作伙伴开始将 Holo3 部署到各种工作流中，涵盖了从浏览器自动化、商业软件到内部工具和桌面应用程序的广泛场景。

As adoption grew, we realized performance alone was no longer enough. Users want to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks. They want deployment flexibility, from cloud inference to fully local execution on end-user devices. This is why we are releasing the Holo3.1 family. 随着采用率的增长，我们意识到仅有性能表现已不足够。用户希望在桌面和移动环境中运行相同的计算机操作功能，并能与不同的智能体框架无缝集成。他们需要部署的灵活性，从云端推理到在终端设备上完全本地化执行。这就是我们发布 Holo3.1 系列的原因。

Holo3.1 improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets. For the first time, we release quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4. Holo3.1 is a major step toward our vision of universal computer-use agents: systems that can operate across environments, integrate into any agent stack, and run wherever the workflow lives. Holo3.1 在生产环境中最关键的三个维度上提升了鲁棒性：环境（Web、桌面、移动端）、智能体框架以及部署目标。我们首次发布了针对本地推理优化的量化检查点，包括 FP8、Q4 GGUF 和 NVFP4。Holo3.1 是我们实现“通用计算机操作智能体”愿景的重要一步：即构建一套能够跨环境运行、集成到任何智能体技术栈，并能在工作流所在的任何位置运行的系统。

Computer Use Across GUI Environments and Agent Harnesses

跨 GUI 环境与智能体框架的计算机操作

Based on the Qwen family, Holo3.1 was designed to improve robustness across the environments where computer-use agents are actually deployed, while retaining state-of-the-art performance. As teams moved Holo3 from evaluation to production, we repeatedly observed the same challenge: strong performance in one setting does not necessarily transfer to another. Mobile devices, alternative agent harnesses, and different execution frameworks all introduce their own sources of distribution shift. 基于 Qwen 系列，Holo3.1 旨在提升计算机操作智能体在实际部署环境中的鲁棒性，同时保持最先进的性能。当团队将 Holo3 从评估阶段转向生产环境时，我们反复观察到一个挑战：在一种设置下的强劲表现并不一定能迁移到另一种设置中。移动设备、不同的智能体框架以及各种执行框架，都会引入各自的分布偏移。

Mobile Automation

移动端自动化

Holo3.1 expands Holo3’s capabilities beyond browser and desktop control, delivering major gains on mobile environments. On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%. Holo3.1 将 Holo3 的能力从浏览器和桌面控制扩展到了移动端，并在移动环境中取得了显著进展。在 AndroidWorld 测试中，我们的 35B-A3B 模型从 67% 提升至 79.3%，而较小的 4B 和 9B 版本则从 58% 提升至 72%。

Cross-Harness Performance

跨框架性能

To better support teams deploying Holo inside third-party agent stacks, Holo3.1 introduces native support for function-calling protocols in addition to the structured JSON outputs already available in Holo3. Across OSWorld and our internal benchmark suite covering e-commerce, business software, and collaboration workflows, function-calling and native execution now achieve near-parity performance. Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness. 为了更好地支持在第三方智能体技术栈中部署 Holo 的团队，Holo3.1 除了提供 Holo3 已有的结构化 JSON 输出外，还引入了对函数调用协议的原生支持。在 OSWorld 和我们涵盖电子商务、商业软件及协作工作流的内部基准测试套件中，函数调用和原生执行现已达到近乎同等的性能。此外，在我们的 Holotab 产品框架中进行评估时，Holo3.1 相比 Holo3 性能提升超过 25%。

Smaller Sizes for Cost-Performance Tradeoffs

兼顾成本与性能的小尺寸模型

To further enable local and on-device inference, we are also releasing new model sizes including small models (0.8B, 4B, and 9B) for cost-effective and private deployment, in addition to the larger 35B-A3B model for state-of-the-art performance. 为了进一步支持本地和端侧推理，我们还发布了新的模型尺寸，包括用于高性价比和私密部署的小型模型（0.8B、4B 和 9B），以及用于追求极致性能的大型 35B-A3B 模型。

Fast & Local Inference

快速且本地化的推理

This is our first release to ship quantized weights. We’re starting with 35B-A3B checkpoints, available in FP8, Q4 GGUF, and NVFP4. For NVFP4, we used NVIDIA’s Model Optimizer in a W4A16 configuration. These checkpoints enable fast local inference for Computer Use Agents with little to no degradation in model performance. FP8 and NVFP4 achieve the same OSWorld scores, only about two points below the full-precision BF16 checkpoint. The speedups are substantial: on DGX Spark, NVFP4 W4A16 delivers 1.41× the total token throughput of FP8 and 1.74× that of BF16. 这是我们首次发布量化权重版本。我们首先提供 35B-A3B 检查点，支持 FP8、Q4 GGUF 和 NVFP4 格式。对于 NVFP4，我们使用了 NVIDIA 的模型优化器（Model Optimizer）并采用 W4A16 配置。这些检查点使计算机操作智能体能够实现快速的本地推理，且模型性能几乎没有下降。FP8 和 NVFP4 在 OSWorld 上获得了相同的分数，仅比全精度 BF16 检查点低约两个百分点。速度提升非常显著：在 DGX Spark 上，NVFP4 W4A16 的总 Token 吞吐量是 FP8 的 1.41 倍，是 BF16 的 1.74 倍。

Towards Local Agents on Consumer Hardware

面向消费级硬件的本地智能体

We also release Q4 GGUF checkpoints aimed at local deployment of Computer Use Agents on consumer hardware. The agent itself runs locally on a Windows or Mac machine, while the model can either run on that same machine—we include reference numbers for Apple Silicon—or on a DGX Spark on the same network. In both cases, execution stays fully private and local, with nothing leaving the user’s network. 我们还发布了 Q4 GGUF 检查点，旨在将计算机操作智能体部署在消费级硬件上。智能体本身在 Windows 或 Mac 机器上本地运行，而模型既可以在同一台机器上运行（我们提供了 Apple Silicon 的参考数据），也可以在同一网络下的 DGX Spark 上运行。在这两种情况下，执行过程都保持完全私密和本地化，没有任何数据离开用户的网络。

On Spark, agent harness optimizations we developed with NVIDIA combined with the NVFP4 quantization above deliver a compound ~2× end-to-end speedup over the FP8 baseline, cutting average step time from 6.8s to 3.3s. 在 Spark 上，我们与 NVIDIA 共同开发的智能体框架优化，结合上述 NVFP4 量化技术，实现了相比 FP8 基准约 2 倍的端到端速度提升，将平均单步执行时间从 6.8 秒缩短至 3.3 秒。

Availability

可用性

The Holo3.1 family is available in four sizes: Holo3.1 系列提供四种尺寸：

Model	Deployment Target
Holo3.1-0.8B	Ultra-lightweight local agents
Holo3.1-4B	Cost-efficient deployment
Holo3.1-9B	Balanced performance and latency
Holo3.1-35B-A3B	State-of-the-art performance

模型	部署目标
Holo3.1-0.8B	超轻量级本地智能体
Holo3.1-4B	高性价比部署
Holo3.1-9B	性能与延迟的平衡
Holo3.1-35B-A3B	最先进的性能

We are also releasing optimized FP8, NVFP4, and Q4 GGUF checkpoints for local and edge deployment. 我们还发布了针对本地和边缘部署优化的 FP8、NVFP4 和 Q4 GGUF 检查点。

Get Started

开始使用

Holo Models API: https://hcompany.ai/holo-models-api Hugging Face: https://huggingface.co/collections/Hcompany/holo31 We look forward to seeing what developers build with Holo3.1. 我们期待看到开发者利用 Holo3.1 构建出什么样的应用。