The Open Source Community is backing OpenEnv for Agentic RL
The Open Source Community is backing OpenEnv for Agentic RL
开源社区全力支持 OpenEnv,助力智能体强化学习 (Agentic RL)
OpenEnv is a tool for creating an agentic execution environment like terminals, browsers, or anything an agent can interact with. And today, we’re excited to announce that OpenEnv is becoming even more open, to make the future of training agents open source. Starting today, OpenEnv will be coordinated by a committee that so far includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face. OpenEnv now lives at huggingface/OpenEnv.
OpenEnv 是一款用于创建智能体执行环境的工具,例如终端、浏览器或任何智能体可以交互的对象。今天,我们很高兴地宣布 OpenEnv 将变得更加开放,旨在让智能体训练的未来实现开源。从今天起,OpenEnv 将由一个委员会共同协调,目前成员包括 Meta-PyTorch、Reflection、Unsloth、Modal、Prime Intellect、Nvidia、Mercor、Fleet AI 和 Hugging Face。OpenEnv 项目现已托管于 huggingface/OpenEnv。
OpenEnv project is supported and adopted by some of the leading organizations in the AI ecosystem, including PyTorch Foundation, vLLM, SkyRL (UCB), Lightning AI, Axolotl AI, Stanford Scaling Intelligence Lab, Mithril, OpenMined, Scaler AI Labs, Scale AI, Patronus AI, Surge AI, Halluminate, Turing, Scorecard, and Snorkel AI.
OpenEnv 项目得到了 AI 生态系统中多家领先组织的支持与采用,包括 PyTorch 基金会、vLLM、SkyRL (UCB)、Lightning AI、Axolotl AI、斯坦福 Scaling Intelligence Lab、Mithril、OpenMined、Scaler AI Labs、Scale AI、Patronus AI、Surge AI、Halluminate、Turing、Scorecard 和 Snorkel AI。
Why we need OpenEnv to train open source agents
为什么我们需要 OpenEnv 来训练开源智能体
Agent harnesses like Claude Code, Codex, OpenClaw, and Hermes just keep improving. One reason for their improvement is that models like GPT-5.5 and Opus 4.8 are trained to use their respective harnesses. We want those gains with open source models too: training local models that use harnesses effectively, and saving compute by specializing models for specific tasks.
Claude Code、Codex、OpenClaw 和 Hermes 等智能体工具框架 (Agent harnesses) 正在不断进步。它们进步的原因之一是 GPT-5.5 和 Opus 4.8 等模型在训练时就专门针对各自的框架进行了优化。我们也希望开源模型能获得同样的收益:通过训练能够有效使用这些框架的本地模型,并利用模型在特定任务上的专业化来节省计算资源。
Why we need to be (even) more open
为什么我们需要(更加)开放
Frontier labs train models and harnesses that, for the most part, work like hand in glove. The model is trained to use the harness and optimised for its characteristics. Models can generalise beyond these harnesses, to some extent, but nothing beats the efficiency of training. In the open, this isn’t the case. Developers use any harness, any model, any inference engine, on whatever use case they value. This is fundamental to the community, but it’s also a challenge that requires infrastructure and tooling to tackle. That’s where OpenEnv comes in. It’s a library to interface between harness, environment, and trainer, which works on any model. For this to stick, it will need to be owned by all the major stakeholders.
前沿实验室训练的模型和框架在很大程度上配合得天衣无缝。模型经过训练以使用该框架,并针对其特性进行了优化。模型在一定程度上可以泛化到这些框架之外,但没有什么比针对性训练更高效了。在开源领域情况则不同,开发者会根据自己的需求,使用任意框架、任意模型和任意推理引擎。这是开源社区的基石,但也带来了一个需要基础设施和工具来解决的挑战。这就是 OpenEnv 的用武之地。它是一个连接框架、环境和训练器的库,适用于任何模型。为了让这一标准长久存在,它需要由所有主要利益相关者共同拥有。
A protocol layer, not a reward framework
一个协议层,而非奖励框架
Alongside the governance change, we’re tightening what OpenEnv is. In recent releases, OpenEnv has become an interoperability layer for RL environments. Its job is to standardize how environments are published, deployed, and consumed by agents. It will not dictate how rewards are defined or how training loops work. Reward definition, scoring rubrics, and trainer-specific logic belong in the libraries that specialize in them. OpenEnv is the common socket they can all plug into.
除了治理结构的变更,我们还明确了 OpenEnv 的定位。在最近的版本中,OpenEnv 已成为强化学习 (RL) 环境的互操作层。其任务是标准化环境的发布、部署以及被智能体调用的方式。它不会规定奖励如何定义或训练循环如何工作。奖励定义、评分标准和训练器特定的逻辑应属于各自的专业库。OpenEnv 是它们都可以接入的通用接口。
In practice this means: 在实践中,这意味着:
- One interface, many environments: which all expose the familiar Gymnasium-style API (reset(), step(), state()) running on a client/server architecture.
- 统一接口,多种环境: 所有环境都通过客户端/服务器架构暴露熟悉的 Gymnasium 风格 API (reset(), step(), state())。
- A trainer that speaks OpenEnv: can drive any compliant environment without bespoke code.
- 支持 OpenEnv 的训练器: 无需定制代码即可驱动任何兼容环境。
- Familiar protocols and canonical packaging: Environments are served over standard protocols like HTTP and WebSocket and packaged with Docker.
- 熟悉的协议与规范化打包: 环境通过 HTTP 和 WebSocket 等标准协议提供服务,并使用 Docker 进行打包。
- MCP is a first-class citizen: so OpenEnv environments are instantly compatible with MCP servers and the same environment behaves consistently in both simulation (train/eval) and production modes.
- MCP 作为一等公民: OpenEnv 环境可立即与 MCP 服务器兼容,且同一环境在模拟(训练/评估)和生产模式下表现一致。
- Interop across env libraries: You can define and consume environments across different ecosystems (verifiers, harbor, and others) and on the infrastructure and hub of your choice. OpenEnv is the deployment and interface layer underneath them, rather than a competitor to them.
- 跨环境库的互操作性: 你可以在不同的生态系统(如 verifiers, harbor 等)以及你选择的基础设施和中心上定义和使用环境。OpenEnv 是它们底层的部署和接口层,而非竞争对手。
What’s next
未来规划
Over the coming months we will focus on the things that turn OpenEnv from a fast-growing project into a dependable standard: 在接下来的几个月里,我们将专注于将 OpenEnv 从一个快速发展的项目转变为一个可靠的标准:
- Tasksets via datasets: wiring environment tasks to Hugging Face datasets so environments and benchmarks compose cleanly (RFC 006).
- 通过数据集实现任务集: 将环境任务与 Hugging Face 数据集连接,使环境和基准测试能够清晰地组合 (RFC 006)。
- External rewards: letting rewards be defined in whichever library you already use, with OpenEnv as the deployment layer (RFC 007).
- 外部奖励: 允许在任何你已使用的库中定义奖励,而 OpenEnv 作为部署层 (RFC 007)。
- Continued Harness integration: first-class support for agentic harnesses.
- 持续的框架集成: 对智能体框架提供一流的支持。
- End-to-end examples: full training and evaluation walkthroughs in TRL, Unsloth, and beyond.
- 端到端示例: 在 TRL、Unsloth 等框架中提供完整的训练和评估演练。
- Auto-validation: measure environment quality and contribution to model learning. This will give the community a scalable way to evaluate their environments and drive up quality (think hackathons!). RFC 008.
- 自动验证: 衡量环境质量及其对模型学习的贡献。这将为社区提供一种可扩展的方式来评估环境并提升质量(例如黑客松活动!)。RFC 008。
Get involved
参与其中
OpenEnv is community-centric by design, and it’s still early — expect rough edges, and help us smooth them. Check out the code and RFCs: github.com/huggingface/OpenEnv Thanks to everyone who helped make this transition happen. Let’s build the common substrate for open-source agentic RL together.
OpenEnv 的设计以社区为中心,目前仍处于早期阶段——可能会有一些不完善之处,欢迎帮助我们改进。查看代码和 RFC:github.com/huggingface/OpenEnv 感谢所有促成这一转变的人。让我们共同构建开源智能体强化学习的通用基石。