Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

超越大语言模型：为何企业级 AI 的规模化应用依赖于智能体逻辑

Guides have aided humanity throughout history. Prehistoric civilizations understood that the sun and the moon could be used to navigate vast distances on land and the high seas. Over time, various journeys facilitated the production of maps for better planning and faster travel time to repeat destinations. Centuries later, the introduction of the compass enabled seagoers to achieve greater accuracy in seeking unexplored destinations. And today, GPS navigation apps guide our every journey.

在人类历史上，向导一直为我们提供着指引。史前文明懂得利用太阳和月亮在陆地和海洋上进行长距离导航。随着时间的推移，各种旅程促进了地图的绘制，从而实现了更好的规划，缩短了前往重复目的地的时间。几个世纪后，指南针的引入使航海者能够更精确地探索未知领域。而今天，GPS 导航应用指引着我们的每一次出行。

In today’s world of agentic AI, AI agents, admittedly, have the potential to enable scalable AI adoption, transforming industries as we know them. However, an intelligent guide, agentic logic, is needed to realize this potential by fueling high agent quality, cost-effectiveness, and consequent end-user trust.

在当今的智能体 AI（Agentic AI）时代，不可否认，AI 智能体有潜力实现 AI 的规模化应用，并彻底改变我们所熟知的各行各业。然而，要实现这一潜力，就需要一种“智能向导”——即智能体逻辑（Agentic Logic），它能提升智能体的质量与成本效益，进而建立终端用户的信任。

Enterprise Workflows & Use Cases

企业工作流与用例

Numerous studies have cited the overwhelming failure of AI pilots, while others have also highlighted the need for AI to operate at the core of enterprise workflows to enable scalable adoption. To better understand this phenomenon and the associated assertion, some analysis of enterprise workflows is required. These workflows are: A. Dynamic and long-running B. Possess a plethora of APIs, databases and services C. Oftentimes are constrained by business policies and/or regulations

大量研究指出，许多 AI 试点项目以失败告终，而另一些研究则强调，AI 必须深入企业工作流的核心，才能实现规模化应用。为了更好地理解这一现象及其相关论断，我们需要对企业工作流进行分析。这些工作流具有以下特点： A. 动态且运行周期长 B. 拥有大量的 API、数据库和服务 C. 通常受到业务政策和/或法规的约束

For an agent to function effectively, given these above characteristics, naturally demands an expanded model context, which state-of-the-art frontier LLMs certainly possess, but at what tradeoff? Increased hallucinations, token consumption? Further, can LLMs be equipped with an intelligent guide, GPS, to enable agentic AI execution at the core of the workflow, driving more desirable outcomes?

鉴于上述特征，智能体若要有效运作，自然需要扩展模型上下文。虽然最先进的前沿大语言模型（LLM）确实具备这种能力，但代价是什么呢？是增加的幻觉，还是更高的 Token 消耗？此外，能否为 LLM 配备一个“智能向导”（GPS），使其能够在工作流核心执行智能体 AI 任务，从而推动实现更理想的结果？

We tested these hypotheses by designing and building agents, equipped with pertinent agent logic, for IBM offerings fully considering the above characteristics. These offerings pertain to some of the most challenging tasks confronting subject matter experts who own various stages of the enterprise software delivery lifecycle for mission critical workloads including:

Understanding applications written in legacy code (Cobol / PL/1)
Expediting test generation for developers
Proactively responding to incidents and enabling shift-left app resiliency
Automating compliance modernization for critical environments

我们通过为 IBM 的产品设计和构建配备了相关智能体逻辑的智能体，充分考虑了上述特征，并对这些假设进行了测试。这些产品涉及企业软件交付生命周期中，负责关键任务的主题专家所面临的一些最具挑战性的任务，包括：

理解用遗留代码（Cobol / PL/1）编写的应用程序
加速开发人员的测试生成
主动响应事故并实现应用韧性的“左移”
关键环境下的合规性现代化自动化

Before examining each of these domains in detail, let us define what characterizes agent logic. Agent logic is software primitives, such as knowledge graphs, algorithms, program analysis libraries, which operate at the agentic layer (within an agent harness) and can intentionally steer the LLM in the direction of the enterprise workflow, reducing the context space. In so doing, have strong tendency to drive more performant outcomes in a more cost-effective manner.

在详细探讨这些领域之前，让我们先定义什么是智能体逻辑。智能体逻辑是指运行在智能体层（智能体框架内）的软件原语，例如知识图谱、算法、程序分析库等。它们能够有意识地引导 LLM 朝着企业工作流的方向运行，从而缩小上下文空间。通过这种方式，它们往往能以更具成本效益的方式推动实现更高性能的结果。

Let us now examine how agent logic is able to achieve such outcomes in each of the above four domains. Understanding applications written in legacy code (Cobol / PL/1) - program analysis. IBM watsonx Code assistant for Z (WCA4Z), used to accelerate mainframe application development and modernization with AI and automation, is equipped with an App Insights agent for application understanding - one of the primary focus areas of enterprise clients running mission critical workloads on IBM mainframe.

现在，让我们看看智能体逻辑如何在上述四个领域中实现这些成果。理解用遗留代码（Cobol / PL/1）编写的应用程序——程序分析。IBM watsonx Code Assistant for Z (WCA4Z) 用于通过 AI 和自动化加速大型机应用程序的开发与现代化。它配备了一个“应用洞察”（App Insights）智能体，用于应用程序理解——这是在 IBM 大型机上运行关键任务的企业客户最关注的领域之一。

This agent leverages deep static analysis across the application and stores a pre-indexed representation in a database schema that spans hundreds of interrelated tables with complex semantics, allowing the agent to retrieve precise, structured already available information; thereby improving answer accuracy, reducing token usage, and minimizing back-and-forth interactions with the language model (Mistral Medium 250B in this instance). This approach when applied to multiple mission-critical legacy systems (up to 1M lines of code and 1K programs) maintains marginally superior app understanding performance with ~30× lower token consumption than a baseline frontier LLM-only approach.

该智能体利用跨应用程序的深度静态分析，并将预索引的表示存储在跨越数百个具有复杂语义的关联表的数据库模式中，从而允许智能体检索精确、结构化的现有信息；进而提高了回答准确性，减少了 Token 使用量，并最大限度地减少了与语言模型（本例中为 Mistral Medium 250B）的往返交互。当应用于多个关键任务遗留系统（多达 100 万行代码和 1000 个程序）时，这种方法在保持略优的应用理解性能的同时，Token 消耗量比仅使用前沿 LLM 的基准方法降低了约 30 倍。