Thousand Token Wood: shipping a multi-agent economy on a 3B model
Thousand Token Wood: shipping a multi-agent economy on a 3B model
千元木林:在 3B 模型上构建多智能体经济系统
A Build Small Hackathon field report on what a 3-billion-parameter council of traders can and cannot do. Try it first: the Space, and the open agent traces. 这是一份关于“小型构建黑客松”(Build Small Hackathon)的实地报告,探讨了由 30 亿参数模型组成的交易员委员会能做什么,以及不能做什么。你可以先尝试一下:Space 空间 以及开放的智能体追踪记录。
I built Thousand Token Wood for the Build Small Hackathon. It is a tiny economy: five woodland creatures, each its own agent on Qwen2.5-3B, trade five goods for pebbles, gossip, hoard, and panic. You poke the wood and watch bubbles, crashes, and a widening wealth gap appear on their own. The model is served with vLLM on Modal; a Gradio app is the window onto the wood. 我为“小型构建黑客松”开发了“千元木林”(Thousand Token Wood)。这是一个微型经济系统:五种森林生物,每一种都是基于 Qwen2.5-3B 的独立智能体,它们通过交易五种商品来获取鹅卵石、传播八卦、囤积物资并陷入恐慌。你只需触碰这片森林,就能观察到泡沫、崩盘和贫富差距扩大的过程。该模型通过 Modal 上的 vLLM 提供服务,而 Gradio 应用则是观察这片森林的窗口。
This is a field report on the engineering, written for people who build with small models. The short version: a 3B model is a reliable format generator and an unreliable reasoner, emergent systems need designed scarcity, and the best demos sit where a technical constraint meets something you already understand deeply. 这是一份关于工程实践的实地报告,专为使用小型模型进行开发的人员而写。简而言之:3B 模型是一个可靠的格式生成器,但推理能力不可靠;涌现系统需要人为设计的稀缺性;而最好的演示案例,往往出现在技术约束与你已深刻理解的事物相交汇的地方。
Why small is the design, not the limit
为什么“小”是设计初衷,而非局限
A living economy needs many agents thinking many times per run. That is exactly where a frontier model is the wrong tool: too slow and too costly to run a council of traders every tick. A small model is what makes a real-time multi-agent simulation feasible. Every creature decides in a single batched GPU call per turn. 一个活跃的经济系统需要多个智能体在每次运行中进行多次思考。这正是前沿模型(大模型)不适用的地方:在每个时间步运行一个交易员委员会,速度太慢且成本太高。小型模型使得实时多智能体模拟成为可能。每个生物在每一轮中只需通过一次批处理 GPU 调用即可做出决策。
The first economy was dead on arrival
最初的经济系统一上线就“死”了
The naive version did nothing. Production outran consumption, so every creature was self-sufficient and never had a reason to trade. The market cleared once and went silent. The fix was to engineer scarcity: 最初的版本什么也没发生。生产力超过了消费力,每个生物都能自给自足,根本没有交易的理由。市场清算一次后就陷入了沉寂。解决办法是人为设计稀缺性:
- Diet variety: a creature can eat only one unit of any single food per meal, so surviving means buying foods it does not grow.
- 饮食多样性: 每个生物每餐只能吃一种食物的一个单位,因此想要生存就必须购买它自己不生产的食物。
- Spoilage: perishable food rots if hoarded, forcing surplus to be sold while it still has value.
- 变质机制: 易腐烂的食物如果被囤积就会腐烂,这迫使生物在食物还有价值时将其卖出。
- A winter fuel crisis: every creature must burn firewood each turn, the need rises over time, and only one creature makes firewood.
- 冬季燃料危机: 每个生物每轮都必须消耗木柴,需求随时间增加,但只有一种生物生产木柴。
That last mechanic drives the drama. One supplier cannot meet rising demand, so the woodcutter gets rich and everyone else competes for warmth. 最后一个机制推动了剧情的发展。单一供应商无法满足日益增长的需求,因此伐木工变得富有,而其他人则为了取暖而竞争。
Valid JSON, weak judgment
有效的 JSON,薄弱的判断力
With scarcity in place, the honest small-model lesson surfaced. The 3B emitted valid JSON on 100% of calls, but its economic judgment was poor: a creature that produced acorns would post an order to buy acorns, the one thing it had in surplus. The fix was not a bigger model, it was a sharper prompt. I told each agent what it produced and must never buy, computed the exact list of goods it was short on, and gave it one worked example. Decision quality jumped and the creatures began trading to their roles. 在引入稀缺性后,关于小型模型的一个诚实教训浮出水面。3B 模型在 100% 的调用中都能输出有效的 JSON,但其经济判断力很差:一个生产橡子的生物会发布购买橡子的订单,而这正是它过剩的东西。解决办法不是换用更大的模型,而是更精准的提示词(Prompt)。我告诉每个智能体它生产什么、绝对不能买什么,计算出它短缺的商品清单,并给出了一个示例。决策质量随之跃升,生物们开始根据各自的角色进行交易。
The whole loop is wrapped in a tolerant JSON parse-and-repair layer, so a malformed response degrades to a no-op instead of crashing the simulation. A second lesson came from wellbeing. I first modeled it as an accumulator, and any chronic shortfall ground every creature to zero over a run, a death spiral that was no fun to watch and that punished the agents’ imperfect optimization. I reframed it as a mean-reverting mood that recovers when a creature is fed and warm and never hits zero. Stakes belong in pebbles, prices, and status, not starvation. 整个循环被包裹在一个容错的 JSON 解析与修复层中,因此格式错误的响应会降级为“无操作”,而不是导致模拟崩溃。第二个教训来自“幸福感”模型。我最初将其建模为一个累加器,任何长期的短缺都会导致每个生物在运行过程中归零,这是一种令人沮丧的死亡螺旋,惩罚了智能体不完美的优化。我将其重新定义为一种“均值回归”的情绪,当生物吃饱且温暖时就会恢复,且永远不会降至零。赌注应该体现在鹅卵石、价格和地位上,而不是饥饿。
Then it started telling stories
随后,它开始讲述故事
The feature I am most pleased with ties the project to market history. The player can draw a Wood Legend: a famous episode reskinned as woodland folklore. Tulip Mania becomes the Great Acorn Mania. The South Sea Bubble becomes the Hollow Log Trading Company. The 1929 bank runs become the Run on Oona’s Hoard. These are not flavor text. Each legend fires real shocks, and the agents react. 我最满意的功能是将该项目与市场历史联系起来。玩家可以抽取一个“森林传说”:将著名的历史事件重新包装为森林民间传说。郁金香狂热变成了“大橡子狂热”;南海泡沫变成了“空心木贸易公司”;1929 年的银行挤兑变成了“乌娜(Oona)宝藏挤兑”。这些不仅仅是背景文字。每个传说都会引发真实的冲击,而智能体会做出反应。
In one run I drew the Run on Oona’s Hoard, the rumor that the owl’s vault was empty. Oona began liquidating her honey to raise pebbles, and the flood of supply crashed the honey price from 10 to 3 over the next turns. A reskinned bank run made an agent dump assets and moved a market price. None of it was scripted. 在一次运行中,我抽到了“乌娜宝藏挤兑”,即关于猫头鹰金库空了的传言。乌娜开始清算她的蜂蜜以换取鹅卵石,供应过剩导致蜂蜜价格在接下来的几轮中从 10 跌至 3。一场换皮后的银行挤兑导致智能体抛售资产并改变了市场价格。这一切都不是预先写好的脚本。
For that to be visible, prices had to move. They were frozen because the agents quoted back the reference price I showed them. The fix was to let the market reference drift with residual supply and demand after each round: heavy unfilled buying pushes a price up, a glut pushes it down. Prices now trend during scarcity and stay calm in balanced trade. 为了让这种变化可见,价格必须波动。它们之前是冻结的,因为智能体只是引用了我展示给它们的参考价格。解决办法是让市场参考价在每一轮结束后,根据剩余的供需关系进行漂移:大量未满足的买单推高价格,供应过剩则压低价格。现在,价格在稀缺时会产生趋势,在平衡交易中则保持平稳。
What actually happened
实际发生了什么
A representative fifteen-turn run, with a drought and a winter rumor injected partway: 一次典型的 15 轮运行,中途注入了干旱和冬季传言:
| Metric | Result |
|---|---|
| Valid JSON actions | 100% (75 of 75 calls) |
| Trades per turn | sustained 3 to 9, never silent |
| Honey price | crashed 10 to 3 during the bank-run legend |
| Firewood price | rose 4 to 7 as winter scarcity bit |
| Wealth gap (Gini) | widened 0.14 to 0.38 |
| Outcome | the woodcutter ended richest, the hoarder broke |
| 指标 | 结果 |
|---|---|
| 有效 JSON 操作 | 100% (75/75 次调用) |
| 每轮交易量 | 持续在 3 到 9 之间,从未静止 |
| 蜂蜜价格 | 在银行挤兑传说中从 10 跌至 3 |
| 木柴价格 | 随着冬季稀缺性加剧,从 4 涨至 7 |
| 贫富差距 (基尼系数) | 从 0.14 扩大到 0.38 |
| 结局 | 伐木工最富有,囤积者破产 |
The reasoning behind every one of those moves is in the open traces dataset: each row is a creature’s full prompt, raw response, parsed actions, and private thought. 每一次行动背后的推理过程都在开放的追踪数据集中:每一行都包含了智能体的完整提示词、原始响应、解析后的动作以及私下思考。
Takeaways for building with small models
小型模型开发心得
Most of the engineering is closing the gap between a small model’s reliable formatting and its unreliable reasoning, with structure and prompting rather than scale. Emergent systems need designed scarcity; abundance is boring. And the most compelling small-model demos do not need invented drama. Three centuries of market history had it ready, and a council of 3B agents was enough to play it out. Small models, big adventures. Try the Space. 大部分工程工作在于弥合小型模型在格式化上的可靠性与推理上的不可靠性之间的差距,这需要通过结构设计和提示词工程,而不是单纯堆砌规模。涌现系统需要人为设计的稀缺性;富足是无聊的。最引人入胜的小型模型演示并不需要虚构的戏剧性。三百年的市场历史已经准备好了素材,而一个 3B 智能体委员会足以将其演绎出来。小型模型,大冒险。去试试这个 Space 吧。