$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

$E^3$-Agent：用于边缘生成式推理资源管理的可执行与进化智能体

Abstract: Edge deployments of generative inference increasingly face two practical realities: per-device per-model performance is often unknown at deployment time, and it is non-stationary due to user-driven semantic events, background load, and device churn. Consequently, a resource manager that is tuned offline under a fixed regime can become brittle and expensive to maintain.

摘要： 生成式推理的边缘部署日益面临两个现实问题：设备与模型的性能在部署时往往是未知的，且由于用户驱动的语义事件、后台负载以及设备更替，这些性能表现是非平稳的。因此，在固定机制下离线调优的资源管理器往往会变得脆弱且维护成本高昂。

This paper presents $E^3$-Agent, an executable and evolving agent for edge artificial intelligence generated content (AIGC) resource management. $E^3$-Agent separates a fast-path router that makes millisecond-level dispatch decisions from a slow-path, event-driven large language model (LLM) meta-controller that mitigates regime shifts through a small, explicit control surface exposed via a tool interface, including risk gating, router configuration, and rapid performance calibration.

本文提出了 $E^3$-Agent，这是一种用于边缘人工智能生成内容（AIGC）资源管理的可执行与进化智能体。$E^3$-Agent 将执行毫秒级调度决策的“快速路径路由器”与“慢速路径事件驱动型大语言模型（LLM）元控制器”分离开来。后者通过工具接口暴露的一套小型显式控制界面（包括风险门控、路由器配置和快速性能校准）来缓解机制偏移。

The agent learns online from execution feedback and continuously adapts to unknown and time-varying service-time mappings. We evaluate $E^3$-Agent in a discrete-event simulator driven by MLPerf-derived device-model measurement priors, covering cold-start warmup and three dynamic regimes: semantic dynamics, device churn, and hidden drift.

该智能体通过执行反馈进行在线学习，并持续适应未知且随时间变化的服务时间映射。我们在由 MLPerf 派生的设备-模型测量先验驱动的离散事件模拟器中对 $E^3$-Agent 进行了评估，涵盖了冷启动预热以及三种动态机制：语义动态、设备更替和隐性漂移。

Across the dynamic scenarios, $E^3$-Agent reduces average latency by 65%-73% compared to the best static baseline, stays within 7%-10% of an online full-information Oracle used for evaluation, and effectively suppresses stutter rate under semantic degradation.

在各种动态场景中，$E^3$-Agent 与最佳静态基准相比，平均延迟降低了 65%-73%，其性能保持在评估所用的在线全信息 Oracle（预言机）的 7%-10% 误差范围内，并能有效抑制语义退化下的卡顿率。