Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering
Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering
何必重复造轮子?用于高效迁移学习的机器学习工程技能分层积累
ML engineering agents waste compute rediscovering known techniques because every competition is a cold start. We present HASTE, a hierarchical multi-agent system that organizes cross-competition knowledge into three scope tiers (global, domain, and competition-specific), each coupled to a matching agent level. 机器学习工程智能体在处理每项竞赛时都处于“冷启动”状态,这导致它们浪费了大量计算资源去重复探索已知的技术。我们提出了 HASTE,这是一个分层多智能体系统,它将跨竞赛的知识组织为三个作用域层级(全局、领域和竞赛特定),每一层都对应一个匹配的智能体级别。
An orchestrator coordinates domain specialists and promotes learning between tiers via LLM-driven abstraction. A controlled ablation provides evidence for scoped loading: holding a 159-skill inventory constant across 8 competitions, tiered loading achieves a 100% medal rate while flat loading reaches only 62.5%, the same medal rate as loading no skills, and consumes 2x the output tokens. 一个协调器负责管理领域专家,并通过大模型驱动的抽象促进各层级间的学习。受控消融实验证明了分层加载的有效性:在 8 场竞赛中保持 159 项技能库不变的情况下,分层加载实现了 100% 的奖牌率,而扁平化加载仅达到 62.5%(与不加载任何技能的奖牌率相同),且消耗的输出 Token 是前者的两倍。
On the full MLE-Bench Lite benchmark (22 Kaggle competitions), HASTE reaches a medal rate of 77.3% using Claude Sonnet 4.6 at 12h per competition. In a cold-start run, the system begins with no accumulated skills. In warm-start runs, it reloads skills learned from earlier competitions, using only global and domain-level skills for transfer across competitions. 在完整的 MLE-Bench Lite 基准测试(22 场 Kaggle 竞赛)中,使用 Claude Sonnet 4.6 模型,在每场竞赛 12 小时的限制下,HASTE 的奖牌率达到了 77.3%。在冷启动运行中,系统开始时没有任何积累的技能;而在热启动运行中,它会重新加载从早期竞赛中学到的技能,仅使用全局和领域级技能进行跨竞赛迁移。
Warm starts use 52% fewer refinement iterations, and the fraction of proposed changes kept by the agent rises from 42% at low inventory to 85% once 50+ skills are available. These results suggest that better knowledge organization can partly substitute for model strength and compute budget in ML-engineering agents. 热启动减少了 52% 的优化迭代次数,且智能体采纳的建议变更比例从技能库较少时的 42% 上升到了拥有 50 多项技能后的 85%。这些结果表明,更好的知识组织方式可以在一定程度上替代机器学习工程智能体对模型规模和计算预算的依赖。