Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

面向 ASR 纠错的误差感知 TF-IDF 检索增强生成技术

Abstract: End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While retrieval-augmented generation frameworks can mitigate these errors using large language models, current architectures face significant challenges. They either rely on standard sparse retrieval that ignores phonetic misrecognitions or utilize heavyweight cross-modal embeddings that introduce high latency.

摘要： 端到端自动语音识别（ASR）系统经常会对罕见实体和领域特定术语产生“幻觉”，尤其是在低资源语言中。虽然检索增强生成（RAG）框架可以利用大语言模型来缓解这些错误，但当前的架构面临着重大挑战。它们要么依赖于忽略语音识别错误的传统稀疏检索，要么使用会引入高延迟的重量级跨模态嵌入。

This letter proposes a highly efficient, purely lexical error-aware framework designed to explicitly resolve phonetic and loop hallucinations. Our approach integrates a symmetric text normalization module with a novel error-aware term frequency-inverse document frequency algorithm. By constructing a sparse diagonal penalty matrix based on historical errors, the retriever mathematically prioritizes corrective documents containing specific high-risk misrecognitions.

本研究提出了一种高效的、纯词汇驱动的误差感知框架，旨在显式解决语音和循环幻觉问题。我们的方法将对称文本归一化模块与一种新颖的误差感知词频-逆文档频率（TF-IDF）算法相结合。通过基于历史错误构建稀疏对角惩罚矩阵，检索器能够在数学上优先处理包含特定高风险误识别内容的纠错文档。

Evaluated on the Persian subset of the FLEURS dataset, our method increased the error-aware hit rate from 53.7% to 90.9%. In end-to-end evaluations, the integrated framework reduced the final word error rate from 23.06% to 18.83%, achieving significant accuracy gains with near-zero inference latency.

在 FLEURS 数据集的波斯语子集上进行评估，我们的方法将误差感知命中率从 53.7% 提高到了 90.9%。在端到端评估中，该集成框架将最终词错误率（WER）从 23.06% 降低至 18.83%，在实现显著精度提升的同时，保持了近乎零的推理延迟。