Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
TL;DR — Building an agent is mostly plumbing: tools, state, guardrails, scaling from one agent to many. CUGA (pip install cuga), short for Configurable Generalist Agent, the Agent Harness for the Enterprise from IBM handles that, so you write just a tool list and a prompt. We built two-dozen single-file apps to prove it. Read one end to end here, then see how the same agent runs sovereign and governed in production without a rewrite.
简而言之——构建智能体(Agent)大部分工作在于“管道”工程:工具、状态、护栏,以及从单个智能体扩展到多个智能体的管理。CUGA(可通过 pip install cuga 安装)是 IBM 推出的企业级智能体框架(Configurable Generalist Agent 的缩写),它处理了这些底层工作,让你只需编写工具列表和提示词(Prompt)即可。我们构建了二十几个单文件应用来验证这一点。你可以在此通读其中一个案例,并了解同一个智能体如何在无需重写代码的情况下,在生产环境中实现自主运行与合规治理。
Most agentic apps start with a week of plumbing before the agent does anything useful. You pick a framework, wire up a model client, write tool adapters, build some way to stream state to a UI, and somewhere in there you also decide what the agent is actually for. The interesting part arrives last. CUGA inverts that. It’s the open-source agent harness from IBM that handles the planning, the execution loop, the tool calls, and the state plumbing for you. What’s left is the part that’s actually yours: which tools the agent can reach, and what you tell it to do.
大多数智能体应用在实现任何有用的功能之前,往往需要花费一周时间进行“管道”搭建。你需要选择框架、连接模型客户端、编写工具适配器、构建将状态流式传输到 UI 的方法,并在这些过程中确定智能体的实际用途。最有趣的部分往往最后才出现。CUGA 颠覆了这一流程。它是 IBM 开源的智能体框架,负责处理规划、执行循环、工具调用和状态管理。剩下的部分才是真正属于你的工作:决定智能体可以使用哪些工具,以及告诉它做什么。
To show what that feels like in practice, we built cuga-apps: two dozen small, working apps, each a single FastAPI file wrapping one CugaAgent, from a movie recommender to an IBM Cloud architecture advisor. They exist to be read and copied. You can click through the live gallery. This article walks through one of them, names what the harness takes off your plate, and shows where the same code goes when you need it governed for production. No new framework to learn first. If you’ve written a FastAPI route, you can read every line.
为了展示其实际效果,我们构建了 cuga-apps:二十几个小型、可运行的应用,每个应用都是一个封装了 CugaAgent 的 FastAPI 单文件,涵盖了从电影推荐器到 IBM Cloud 架构顾问等多种功能。它们旨在供开发者阅读和复制。你可以点击查看在线演示库。本文将详细介绍其中一个案例,说明该框架为你分担了哪些工作,并展示当需要生产环境合规治理时,这些代码如何迁移。无需预先学习新框架,只要你写过 FastAPI 路由,就能读懂每一行代码。
Why a harness, not a framework? The fair question to ask of anything in this space is what it saves you from writing. CUGA’s answer: the orchestration around a model that you’d otherwise rebuild every time. It plans before it acts, then executes with a mix of tool calls and generated code (CodeAct). On a long task that runs twenty steps, the thing that breaks most agents is losing track of intermediate results and re-deriving them (often wrong) on the next turn; CUGA holds that state and runs a reflection step that can catch a bad call and re-plan instead of barreling ahead. That machinery is why it has topped agent benchmarks like AppWorld and WebArena rather than something you tune by hand.
为什么要用框架(Harness)而不是通用框架(Framework)?在这个领域,最值得问的问题是:它能帮你省去哪些代码编写工作?CUGA 的答案是:围绕模型进行的编排工作,否则你每次都得重写。它在行动前先进行规划,然后通过工具调用和生成代码(CodeAct)的组合来执行。在运行二十步的长任务中,大多数智能体失败的原因在于丢失中间结果,并在下一轮重新推导(往往是错误的);CUGA 会保持这些状态,并运行一个反思步骤,能够捕获错误的调用并重新规划,而不是盲目推进。正是这种机制使其在 AppWorld 和 WebArena 等智能体基准测试中名列前茅,而无需你手动调优。
You also set the cost/latency tradeoff from config rather than code: Fast, Balanced, and Accurate reasoning modes, with code execution in whatever sandbox you trust (local, Docker/Podman, or E2B cloud). Same agent definition, different dial. That dial matters more than it sounds. Most harnesses assume a frontier model sits underneath and lean on it to recover when a plan goes sideways; CUGA does that work itself. The planning, the reflection step, the variable-tracking that keeps a long run on course — that’s the harness carrying load the model would otherwise have to, which is what lets a smaller open-weight model hold up where it normally wouldn’t. It’s why the hosted apps run on gpt-oss-120b rather than a frontier API.
你还可以通过配置而非代码来设置成本与延迟的权衡:提供快速、平衡和精确的推理模式,并在你信任的沙箱(本地、Docker/Podman 或 E2B 云)中执行代码。同样的智能体定义,只需调整不同的参数。这个参数调节比听起来重要得多。大多数框架假设底层运行着前沿模型,并依赖它在计划出错时进行恢复;而 CUGA 自己完成了这项工作。规划、反思步骤、保持长任务运行轨迹的变量跟踪——这些都是框架在承担本应由模型承担的负载,这使得较小的开源模型也能在通常无法胜任的场景中表现出色。这就是为什么托管应用运行在 gpt-oss-120b 上,而不是前沿模型 API 上。
None of the individual pieces is unique to CUGA. What’s different is that they come pre-assembled, so you configure them instead of wiring them together. The API you touch is small — build a CugaAgent with a tool list and a prompt, then await agent.invoke(…). Everything below that line is the harness. Concretely, that’s interchangeable tools (OpenAPI, MCP, and LangChain functions all bind the same way), long-horizon planning with variable management and self-correction (the machinery behind #1 on AppWorld from 07/25 - 02/26 and WebArena from 02/25 - 09/25), declarative guardrails, multi-agent delegation over A2A, Docling-powered RAG, and one-env-var provider switching (pip install cuga, then OpenAI, watsonx, Ollama, and more) — each something you’d otherwise build yourself. The first word of the name does the work: Configurable; the hard parts are handled, so your job is just the task.
这些组件本身并非 CUGA 独有。不同之处在于它们是预先组装好的,因此你只需进行配置,而无需手动连接。你接触的 API 非常简洁——用工具列表和提示词构建一个 CugaAgent,然后调用 await agent.invoke(...)。该行代码以下的所有内容都由框架处理。具体来说,这包括可互换的工具(OpenAPI、MCP 和 LangChain 函数绑定方式一致)、带有变量管理和自我修正的长周期规划(这是 AppWorld 和 WebArena 排名第一背后的机制)、声明式护栏、基于 A2A 的多智能体委派、由 Docling 驱动的 RAG,以及通过一个环境变量即可切换的模型提供商(pip install cuga 后即可切换 OpenAI、watsonx、Ollama 等)——每一项都是你原本需要自己构建的功能。名字的第一个词就说明了一切:可配置(Configurable);困难的部分已经解决,你的工作只需专注于任务本身。