I Gave My OpenClaw Agent a Physical Body
I Gave My OpenClaw Agent a Physical Body
我为我的 OpenClaw 智能体装上了实体
I recently gave my OpenClaw a real robot arm to play with. The results just about blew my own neural network. 最近,我为我的 OpenClaw 配备了一只真正的机械臂。实验结果简直让我自己的“神经网络”都快要过载了。
The AI agent was able to configure the arm, use it to see and slowly grab things, and even train another AI model to pick up and place specific objects. And they say AGI is still a few years away! (I’m joking, it probably is). 这个 AI 智能体不仅能够配置机械臂,还能利用它进行视觉感知并缓慢抓取物体,甚至能训练另一个 AI 模型来拾取和放置特定物品。人们总说通用人工智能(AGI)还要几年才能实现!(我开个玩笑,可能确实还需要几年)。
The results have me convinced that we may be on the brink of a robotics breakthrough. Training and controlling robots used to require considerable skill. Today’s AI models can make it almost easy. 这些结果让我确信,我们可能正处于机器人技术突破的边缘。过去,训练和控制机器人需要相当高超的技能,而如今的 AI 模型让这一切变得几乎轻而易举。
“AI-powered coding is super exciting because it has the potential to bridge the gap between conventional engineering methods, which are reliable but don’t generalize, and contemporary vision-language-action models, which generalize but are not yet reliable,” says Ken Goldberg, a roboticist at UC Berkeley who is exploring the approach. 加州大学伯克利分校的机器人专家 Ken Goldberg 正在探索这一方向,他表示:“AI 驱动的编程非常令人兴奋,因为它有望弥合传统工程方法与当代视觉-语言-动作模型之间的鸿沟。前者可靠但缺乏泛化能力,而后者具备泛化能力却尚不可靠。”
I told OpenClaw to try moving its new arm and it came up with this little wave. I bought a prebuilt arm called a LeRobot 101. It’s part of an open-source project from HuggingFace that makes it relatively cheap to start building and experimenting with robotics. 我让 OpenClaw 尝试移动它的新手臂,它竟然挥了挥手。我购买了一款名为 LeRobot 101 的预制机械臂,它是 HuggingFace 开源项目的一部分,这使得入门机器人构建与实验的成本相对较低。
The LeRobot comes with two arms: a controller arm that a person operates using a handle and a trigger, and a follower arm with a camera that replicates those movements. You can train an AI model by teleoperating the controller arm and having the model learn how to move the follower in response to what it sees on the camera. LeRobot 配备了两只手臂:一只由人通过手柄和扳机操作的控制臂,以及一只带有摄像头的从动臂,用于复刻控制臂的动作。你可以通过远程操作控制臂来训练 AI 模型,让模型学习如何根据摄像头捕捉到的画面来驱动从动臂。
Building With OpenClaw
使用 OpenClaw 进行构建
Before using OpenClaw, I spent several hours trying to connect and calibrate the robot, at one point nearly breaking the motors by applying the wrong settings, which caused them to overheat. 在使用 OpenClaw 之前,我花了几个小时尝试连接和校准机器人。期间因为设置错误,差点导致电机过热而损坏。
Then, with help from OpenClaw and Codex, I was able to vibe code a simple program that closed the claw’s gripper when it spotted a red ball. In the terminal, Codex went through the tricky work of configuring the connections to the robot. Then, with my help, it calibrated the positions of its joints. It also wrote a Python script that used several libraries to identify and grip the ball in question. Vibe-coding isn’t perfect of course, and hallucinations can introduce bugs especially when working with different hardware, but the results were impressive. 随后,在 OpenClaw 和 Codex 的帮助下,我通过“灵感编程”(vibe coding)写出了一个简单的程序:当检测到红球时,机械爪会自动闭合。在终端里,Codex 完成了配置机器人连接的复杂工作。接着,在我的协助下,它校准了关节位置,并编写了一个 Python 脚本,调用多个库来识别并抓取那个红球。“灵感编程”当然并不完美,尤其是在处理不同硬件时,AI 的幻觉可能会引入 Bug,但最终结果令人印象深刻。
Then, with my help, the robot-agent figured out how to identify and grip a red ball. A neat result, yes, but not exactly Terminator. Next I tried having OpenClaw help me train a model to control the arm. We experimented with a few different approaches, and OpenClaw was adept at guiding me through the process and checking the error rate of the model after each training run. 在我的帮助下,这个机器人智能体学会了如何识别并抓取红球。这确实是个不错的结果,但离《终结者》还差得远。接下来,我尝试让 OpenClaw 协助我训练一个控制机械臂的模型。我们尝试了几种不同的方法,OpenClaw 非常擅长引导我完成整个流程,并在每次训练后检查模型的错误率。
Finally, the robot arm was able to pick up objects. 最终,机械臂成功实现了抓取物体的功能。
Code as Policy
代码即策略 (Code as Policy)
The idea that AI-powered coding could offer a powerful new way to build robots was first highlighted in a research paper from 2022 that dubbed the approach “code as policy.” Since then, AI’s coding skills have advanced at a dizzying pace, and the code-as-policy method has gained traction in many labs. AI 驱动编程可以为构建机器人提供一种强大的新途径,这一理念最早在 2022 年的一篇研究论文中被提出,并将该方法称为“代码即策略”(Code as Policy)。自那时起,AI 的编程能力突飞猛进,“代码即策略”的方法也在许多实验室中获得了广泛关注。
Goldberg’s research group, together with researchers from Nvidia, Carnegie Mellon University, and Stanford, recently developed a new benchmark called CaP-X to measure the robot capabilities of coding models. Interestingly, CaP-X shows that the best model for programming robots isn’t Claude or ChatGPT but Gemini—perhaps because Google DeepMind has focused on training its models to be multimodal and make sense of the physical world. Along with the benchmark, the researchers created CaP-Gym, an environment that lets coding agents control both simulated and real robots. They also developed CaP-Agent0, an agentic framework that boosts the performance of coding models so much that they beat models trained to control a robot’s movements directly on some manipulation tasks. Goldberg 的研究小组与来自英伟达、卡内基梅隆大学和斯坦福大学的研究人员合作,最近开发了一个名为 CaP-X 的新基准,用于衡量编程模型对机器人的控制能力。有趣的是,CaP-X 显示,目前编程机器人的最佳模型并非 Claude 或 ChatGPT,而是 Gemini——这可能是因为 Google DeepMind 一直致力于训练其模型的多模态能力,使其能够理解物理世界。除了基准测试,研究人员还创建了 CaP-Gym,这是一个让编程智能体能够控制模拟和真实机器人的环境。他们还开发了 CaP-Agent0,这是一个智能体框架,能显著提升编程模型的性能,使其在某些操作任务上甚至超越了专门训练用于直接控制机器人动作的模型。
Goldberg’s team is working with Nvidia to explore the potential of the code-as-policy approach. I spoke to Spencer Huang (none other than Jensen Huang’s son), who has been involved in organizing hackathons inside the company to let people try their hand at vibe coding robots. Huang is currently working on a research project with Goldberg that should make the code-as-policy approach compatible with more robot software tools. Goldberg 的团队正与英伟达合作,探索“代码即策略”方法的潜力。我采访了 Spencer Huang(他正是黄仁勋的儿子),他一直参与组织公司内部的黑客马拉松,让人们尝试通过“灵感编程”来控制机器人。Huang 目前正与 Goldberg 合作一个研究项目,旨在使“代码即策略”方法能够兼容更多的机器人软件工具。
“Nearly anyone can get into robotics, which is the true holy grail,” Huang tells me. Making it possible for people to control robots with spoken or typed commands, or by demonstrating an action, is the “critical unlock for robots in society,” he adds. “几乎任何人都能进入机器人领域,这才是真正的圣杯,”Huang 告诉我。他补充说,让人们能够通过语音、文字指令或动作演示来控制机器人,是“机器人走进社会的关键解锁点”。