Effective Performance Measurement: Challenges and Opportunities in KPI Extraction from Earnings Calls

有效的绩效衡量：从财报电话会议中提取关键绩效指标（KPI）的挑战与机遇

Abstract: Earnings calls are a key source of financial information about public companies. However, extracting information from these calls is difficult. Unlike the templatic filings required by the U.S. Securities and Exchange Commission (SEC) to report a company’s financial situation, earnings conference calls have no built-in labels, are unstructured, and feature conversational language.

摘要： 财报电话会议是获取上市公司财务信息的关键来源。然而，从这些会议中提取信息十分困难。与美国证券交易委员会（SEC）要求报告公司财务状况的模板化文件不同，财报电话会议没有内置标签，属于非结构化数据，且包含大量口语化表达。

We explore this challenging domain by assessing the information captured by models trained on SEC filings and in-context learning methods. To establish a baseline, we first evaluate the generalization capabilities of SEC-trained models across established SEC datasets.

我们通过评估基于 SEC 文件训练的模型以及上下文学习（in-context learning）方法所捕获的信息，探索了这一具有挑战性的领域。为了建立基准，我们首先评估了 SEC 训练模型在既有 SEC 数据集上的泛化能力。

To support our investigation, we introduce three novel benchmarks: (1) SEC Filings Benchmark (SECB), (2) Earnings Calls Benchmark (ECB), and ECB-A, a subset with 2,460 expert annotation groups to support our qualitative analysis. We find that encoder-based models struggle with the domain shift.

为了支持我们的研究，我们引入了三个新的基准测试：(1) SEC 文件基准（SECB），(2) 财报电话会议基准（ECB），以及 ECB-A（一个包含 2,460 个专家标注组的子集，用于支持定性分析）。我们发现，基于编码器（encoder-based）的模型在应对领域迁移（domain shift）时表现吃力。

Finally, we propose a system utilizing LLMs to perform open-ended extraction from unstructured call transcripts, verified by human evaluation (79.7% precision), providing a baseline for this valuable domain through the consistent tracking of emergent KPIs.

最后，我们提出了一种利用大语言模型（LLM）从非结构化会议记录中进行开放式提取的系统，并经人工评估验证（准确率为 79.7%），通过对新兴 KPI 的持续追踪，为这一极具价值的领域提供了基准。