Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

面向大模型智能体工作流的可靠性设计：优化延迟、可靠性与成本的权衡

Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. 摘要： 现代人工智能系统日益依赖由多个交互智能体组成的工作流，其中一些由大语言模型（LLM）驱动，另一些则由传统的计算模块驱动。

This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows. 本文分析了在大模型驱动的智能体工作流中，延迟、可靠性和成本之间的基本权衡关系。

We introduce performance models for both LLM and non-LLM agents that capture the relationship between computational effort and output quality, incorporating the impact of reasoning and output tokens for LLM agents using a parametric exponential reliability function. 我们为大模型智能体和非大模型智能体引入了性能模型，用以捕捉计算投入与输出质量之间的关系；并利用参数化指数可靠性函数，纳入了推理和输出 Token 对大模型智能体的影响。

Then, we study the design of sequential workflows under latency and cost constraints. 随后，我们研究了在延迟和成本约束下的顺序工作流设计问题。

Main results include a water-filling token allocation policy and characterizations of optimal workflow reliability in terms of shadow prices. 主要研究成果包括一种“注水法”（water-filling）Token 分配策略，以及基于影子价格对最优工作流可靠性的刻画。

Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE) 学科分类： 人工智能 (cs.AI)；软件工程 (cs.SE)

Cite as: arXiv:2605.23929 [cs.AI] 引用格式： arXiv:2605.23929 [cs.AI]