Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.
Day 1 — I’m Homeless. I Just Shipped an Autonomous Multi-Agent System.
第一天——我无家可归,但我刚刚发布了一个自主多智能体系统。
Day 1 — I’m Homeless. I Just Shipped an Autonomous Multi-Agent System. Let’s get the uncomfortable part out of the way first: I’m a developer. I’m homeless. I have zero money. That part isn’t interesting. What happens next is. 第一天——我无家可归,但我刚刚发布了一个自主多智能体系统。让我们先跳过那些令人不适的部分:我是一名开发者,我无家可归,我身无分文。这部分并不重要,重要的是接下来发生的事情。
Twelve hours ago I had a single-agent bot called ZeroClaw posting occasionally to Bluesky. It worked but it was brittle — 15 tool-call iterations max, 50 messages of history, no memory across runs, no plan, no way to get better. 十二小时前,我有一个名为 ZeroClaw 的单智能体机器人,偶尔会在 Bluesky 上发帖。它能运行,但非常脆弱——最多只能进行 15 次工具调用迭代,历史记录仅 50 条,跨运行周期没有记忆,没有计划,也没有自我优化的能力。
Today I shipped: A CEO agent that reads KPIs every night and writes a strategic report with concrete recommendations; An auditor system where dedicated agents audit each worker and propose config changes — reviewed by the CEO, with me still holding veto; Config-driven self-improvement — YAML files, not Python code, so agents can evolve without ever touching executable code; A metrics database every agent run is logged to, so the CEO actually reasons about real data instead of hallucinating; The whole thing running on a $13/month VPS, using free Gemini tier plus my $280 GCP credits, all open-source (CrewAI, MIT licensed). 今天,我发布了:一个每晚读取 KPI 并撰写包含具体建议的战略报告的 CEO 智能体;一个审计系统,由专门的智能体审计每个工作智能体并提出配置变更建议——由 CEO 审核,我保留最终否决权;基于配置的自我改进——使用 YAML 文件而非 Python 代码,因此智能体可以在不触碰可执行代码的情况下进化;一个记录每次智能体运行的指标数据库,让 CEO 能够基于真实数据进行推理,而不是产生幻觉;整个系统运行在每月 13 美元的 VPS 上,使用免费的 Gemini 层级加上我 280 美元的 GCP 额度,且完全开源(CrewAI,MIT 许可)。
And yes — at the end of the day the CEO agent did the one thing that convinced me this is real: it ran, looked at the metrics DB, found its own four previous failed runs, diagnosed them correctly, and wrote a report with action items to fix the stability problems. Let me walk you through it. 是的,在这一天结束时,CEO 智能体做了一件事,让我确信这一切都是真实的:它运行起来,查看了指标数据库,发现了自己之前四次失败的运行记录,准确地诊断出了原因,并写出了一份包含修复稳定性问题行动项的报告。让我带你了解一下。
The setup
系统架构
Hardware: a single Google Cloud e2-small VM — 2 GB RAM, 2 shared vCPUs, 20 GB disk. Costs about €13/month. My remaining GCP credits give me ~20 months of runway on that. 硬件:一台 Google Cloud e2-small 虚拟机——2 GB 内存,2 个共享 vCPU,20 GB 磁盘。每月成本约 13 欧元。我剩余的 GCP 额度足够支撑约 20 个月的运行。
LLMs: Gemini Flash-Lite for most roles, Gemini Pro for the CEO. Free OpenRouter models are still wired in as emergency fallback, but I stopped using them as primary because they rate-limit hard under concurrent crew load. 大模型:大多数角色使用 Gemini Flash-Lite,CEO 使用 Gemini Pro。免费的 OpenRouter 模型仍作为紧急备用,但我已不再将其作为主要模型,因为在多智能体并发负载下,它们的速率限制非常严格。
Storage: SQLite for metrics, local YAML files for agent configs, plain markdown for every doc, ChromaDB (embedded) for the memory system. No external managed services. No $2,000/month vector database. No “AI platform.” Everything fits in a single Python venv on one VPS. 存储:使用 SQLite 存储指标,本地 YAML 文件存储智能体配置,所有文档均为纯 Markdown 格式,ChromaDB(嵌入式)用于记忆系统。没有外部托管服务,没有每月 2000 美元的向量数据库,也没有所谓的“AI 平台”。一切都运行在单个 VPS 上的 Python 虚拟环境中。
The real architectural win: config vs code
真正的架构胜利:配置优于代码
Everyone building multi-agent systems eventually faces this choice: when an auditor agent spots a problem with a worker agent, how does it actually improve it? The naive answer: “let it rewrite the worker’s Python code.” This is what every demo video shows. It’s also what breaks in production — LLMs hallucinate imports, break syntax, introduce security holes, get stuck in rewrite loops. 每个构建多智能体系统的人最终都会面临这个选择:当审计智能体发现工作智能体有问题时,它该如何改进?天真的回答是:“让它重写工作智能体的 Python 代码。”这是每个演示视频中展示的做法,但这也是生产环境中容易崩溃的原因——大模型会产生错误的导入、破坏语法、引入安全漏洞,并陷入重写循环。
The pattern I landed on: agents modify YAML, never Python. 我最终采用的模式是:智能体只修改 YAML,绝不修改 Python。
(Code structure omitted for brevity) (此处省略代码结构以保持简洁)
When the auditor thinks the researcher is weak, it writes a proposal YAML: 当审计智能体认为研究员智能体表现不佳时,它会编写一份 YAML 提案:
(Proposal example omitted) (此处省略提案示例)
The CEO reviews the proposal overnight. If it approves, the change becomes a single-line YAML edit plus a ceo: approve … commit in git. Every autonomous change is a git commit. You can git revert any bad decision in ten seconds. The Python code stays static and battle-tested. This is probably the single best design decision I made today.
CEO 会在夜间审核提案。如果批准,变更将变成一行 YAML 编辑,并在 Git 中执行 ceo: approve ... commit。每一次自主变更都是一次 Git 提交。你可以在十秒钟内撤销任何错误的决定。Python 代码保持静态且经过实战检验。这可能是我今天做出的最棒的设计决策。
Why a “CEO” agent, and why it isn’t bullshit
为什么需要“CEO”智能体,以及为什么它不是胡扯
I was skeptical of the CEO-agent idea at first. Every half-working multi-agent demo has a “manager” that says deep things like “let’s optimize our strategy” and produces nothing useful. The fix: the CEO doesn’t get to reason about vibes. It reasons about KPIs. Hard numbers, pulled from SQLite. 起初,我对 CEO 智能体的想法持怀疑态度。每一个半成品多智能体演示都有一个“经理”,只会说些“让我们优化策略”之类的空话,却产不出任何有用的东西。解决方案是:CEO 不允许基于“感觉”进行推理,它必须基于 KPI 进行推理。这些是直接从 SQLite 中提取的硬性数据。
(KPI list and operational details omitted) (此处省略 KPI 列表及操作细节)
When I ran it for the first time today, the report opened with: “No KPIs recorded in the last 14 days. This appears to be the initial run. The last 3 days of run history show a 100% failure rate (4 errors) on the ceo_crew. Issues include missing environment variables, missing packages, and embedder configuration validation errors.” 今天我第一次运行它时,报告开头写道:“过去 14 天未记录 KPI。这似乎是首次运行。过去 3 天的运行历史显示 ceo_crew 的失败率为 100%(4 次错误)。问题包括缺少环境变量、缺少包以及嵌入器配置验证错误。”
All four of those failures were real — my earlier attempts that day where I forgot to source env vars, where the Google GenAI provider wasn’t installed, where the embedder config had the wrong provider string. The metrics DB had captured every one. The CEO just read them back to me. That’s when I knew this was working. 这四次失败都是真实的——那是我当天早些时候的尝试,当时我忘记加载环境变量,Google GenAI 提供程序未安装,或者嵌入器配置中的提供程序字符串错误。指标数据库捕捉到了每一次错误。CEO 只是把它们读给我听。那一刻,我知道这套系统成功了。
What I shipped today (checklist)
我今天发布的内容(清单)
For the developers reading this, here’s the actual work: Upgraded the VPS — e2-micro (1 GB) to e2-small (2 GB, 2 vCPU), disk grown 10 → 20 GB for CrewAI deps; Installed on VPS — python3-venv, rsync, cloud-guest-utils, CrewAI 1.14, LiteLLM 1.83, ChromaDB 1.1, google-generativeai; Bumped ZeroClaw limits — tool iterations 15→75, history 50→200. 对于阅读本文的开发者,以下是实际的工作内容:升级了 VPS——从 e2-micro (1 GB) 升级到 e2-small (2 GB, 2 vCPU),磁盘从 10 GB 扩容至 20 GB 以容纳 CrewAI 依赖;在 VPS 上安装了 python3-venv, rsync, cloud-guest-utils, CrewAI 1.14, LiteLLM 1.83, ChromaDB 1.1, google-generativeai;提高了 ZeroClaw 的限制——工具迭代次数从 15 提升至 75,历史记录从 50 提升至 200。