Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

实时分析的发现智能体：迈向主动式洞察系统

Modern analytics systems are fundamentally reactive, requiring users to define queries over increasingly complex and continuously evolving data. In real-time streaming environments, this paradigm breaks down, as the space of potential insights becomes too large to enumerate manually. 现代分析系统从根本上说是被动式的，要求用户针对日益复杂且不断演变的数据定义查询。在实时流处理环境中，这种范式已不再适用，因为潜在的洞察空间过于庞大，无法通过人工一一列举。

We present a multi-agent architecture for autonomous insight discovery over real-time data streams. The system implements a continuous discovery loop in which agents generate hypotheses, compile them into executable analytics, validate generated artifacts, and produce visualizations and deployable applications. 我们提出了一种用于实时数据流自主洞察发现的多智能体架构。该系统实现了一个持续的发现循环，智能体在其中生成假设、将其编译为可执行的分析任务、验证生成的产物，并最终生成可视化图表和可部署的应用程序。

The architecture leverages Apache Kafka for event-driven coordination, Apache Flink for stream processing, and large language models to implement specialized agents. A key contribution is a contract-driven design based on typed intermediate artifacts, enabling modularity, observability, lineage, and safer execution of dynamically generated analytics. 该架构利用 Apache Kafka 进行事件驱动的协调，使用 Apache Flink 进行流处理，并借助大语言模型来实现专业化智能体。其核心贡献在于一种基于类型化中间产物的契约驱动设计，从而实现了模块化、可观测性、血缘追踪，以及对动态生成分析任务的更安全执行。

Through use cases in retail, finance, and public data, we show how this architecture supports a shift from query-driven analytics to proactive, discovery-driven systems. 通过在零售、金融和公共数据领域的用例，我们展示了该架构如何支持从“查询驱动型分析”向“主动式、发现驱动型系统”的转变。