RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

RAG-Coding：利用结构化外部知识增强大语言模型医学编码能力

Abstract: We present RAG-Coding, an agentic method for automated ICD-10-CM coding. RAG-Coding orchestrates four large language model (LLM) agents and grounds their coding decisions in external knowledge sources (e.g. the official coding tabular list and guidelines). By retrieving and cross-referencing relevant knowledge in these sources, the agents enhance coding accuracy and ensure clinical compliance.

摘要： 我们提出了 RAG-Coding，这是一种用于自动化 ICD-10-CM 编码的智能体方法。RAG-Coding 协调了四个大语言模型（LLM）智能体，并将其编码决策建立在外部知识源（如官方编码列表和指南）的基础上。通过检索并交叉引用这些来源中的相关知识，智能体能够提高编码准确性并确保临床合规性。

On the MDACE dataset, RAG-Coding outperforms the best LLM-based baseline by 8-13% in micro-F1 and 2-8% in macro-F1 across multiple LLM backbones. Compared to the state-of-the-art pretrained language model method, PLM-ICD, RAG-Coding exhibits higher micro recall (+11%), while PLM-ICD exhibits higher micro precision (+6%), yielding comparable micro- and macro-F1. Ablations show stepwise gains, highlighting the importance of incorporating external knowledge.

在 MDACE 数据集上，RAG-Coding 在多种 LLM 主干模型上的表现均优于目前最佳的基于 LLM 的基准模型，其 micro-F1 提升了 8-13%，macro-F1 提升了 2-8%。与目前最先进的预训练语言模型方法 PLM-ICD 相比，RAG-Coding 表现出更高的 micro 召回率（+11%），而 PLM-ICD 则表现出更高的 micro 精确率（+6%），两者在 micro-F1 和 macro-F1 上表现相当。消融实验显示了逐步的性能提升，凸显了整合外部知识的重要性。

We also release MDACE-2025, updating the original dataset with expert re-annotations with the latest 2025 ICD-10-CM guidelines. This update features more fine-grained code labels and enables evaluation against current clinical standards.

我们还发布了 MDACE-2025，该版本利用最新的 2025 年 ICD-10-CM 指南对原始数据集进行了专家重新标注。此次更新提供了更细粒度的编码标签，并支持根据当前的临床标准进行评估。