Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

解构进化式 Mixture-of-LoRA 架构:路由杠杆、生命周期惩罚与底物条件边界

Abstract: We decompose an evolutionary mixture-of-LoRA system on a from-scratch ~150M-parameter widened-D substrate (D=1536, V=32000; D/V approx 0.048; the “widened-1536” substrate) into three factors — a router rewrite (parallel sigmoid gate with learnable per-adapter floor and bounded temperature anneal, fed post-stack hidden states rather than token-embedding means), a per-domain leave-one-out evaluation scope, and a lifecycle of death plus alpha-blend inheritance plus SVD mutation plus slot reallocation — and report a 5-of-8 partial 2^3 factorial run at n=3 seeds and 25000 adaptation steps per cell.

摘要: 我们在一个从零开始训练的约 1.5 亿参数、加宽 D 维度的底物(D=1536, V=32000;D/V 约为 0.048;即“加宽-1536”底物)上,将进化式 Mixture-of-LoRA 系统分解为三个要素:路由重写(采用并行 Sigmoid 门控,具备可学习的各适配器下限和有界温度退火,输入为堆叠后的隐藏状态而非 Token 嵌入均值)、各领域留一法评估范围,以及包含死亡机制、Alpha 混合继承、SVD 变异和槽位重分配的生命周期管理。我们报告了在 n=3 个随机种子下,每个单元进行 25000 次适应步骤的 5/8 部分 2^3 析因实验结果。

The attribution chain is sharp on this substrate: the router rewrite carries the entire +0.0426 nat balanced log-PPL improvement (Delta = log PPL_ref - log PPL_test, positive = improvement; t=12.86, p=0.006) attributed to “the full evolutionary system vs the static B3 baseline”; the headline full-system-vs-B3 balanced contrast itself is +0.015 nats, t=1.94, p=0.19 at n=3 and does not clear alpha=0.05.

在该底物上,归因链条非常清晰:路由重写贡献了全部 +0.0426 nat 的平衡对数困惑度(log-PPL)提升(Delta = log PPL_ref - log PPL_test,正值表示提升;t=12.86, p=0.006),这被归因于“完整进化系统与静态 B3 基线”的对比;而作为标题的“完整系统与 B3”平衡对比本身在 n=3 时仅为 +0.015 nats(t=1.94, p=0.19),未达到 alpha=0.05 的显著性水平。

The per-domain evaluation scope is null at seed-resolution, and the lifecycle is a net drag of approx -0.028 nats (t=-4.46, p=0.047 in the primary chain). An auxiliary alpha=0 inheritance counterfactual at n=3 seeds is sign-inconsistent at the headline metric and underpowered for either an equivalence or load-bearing conclusion (corrected from an earlier arithmetic-mean aggregator that erroneously cleared inheritance; see Appendix B.11).

各领域的评估范围在种子分辨率下表现为无效,而生命周期管理则带来了约 -0.028 nats 的净拖累(在主链中 t=-4.46, p=0.047)。在 n=3 个种子下进行的辅助性 alpha=0 继承反事实实验,在核心指标上表现出符号不一致,且因统计效能不足,无法得出等效性或承载性结论(该结论已修正了早期因算术平均聚合器错误导致继承被清除的问题;详见附录 B.11)。

A base-perturbation probe directionally refutes a “genomic-context” reframe of the lifecycle role. A controllable synthetic sandbox locates a substrate-conditional regime boundary: evolutionary search on the routing channel is load-bearing only when adapters are pre-aligned to the task; in every other regime tested it underperforms, ties, or actively degrades the gradient solution.

基准扰动探测从方向上反驳了将生命周期角色重构为“基因组上下文”的观点。一个可控的合成沙盒实验定位出了一个底物条件下的机制边界:仅当适配器预先针对任务进行对齐时,路由通道上的进化搜索才具有承载作用;在测试的其他所有机制中,它要么表现不佳、持平,要么会主动恶化梯度解。