OralAgent: Integrating Reasoning, Tools, and Knowledge for Interactive Dental Image Analysis

OralAgent：集成推理、工具与知识的交互式牙科图像分析系统

Abstract: Dental image analysis plays a pivotal role in supporting accurate diagnosis and treatment planning in oral healthcare. Although recent advances have produced dental AI models for specific tasks and individual imaging modalities, their isolated designs limit practical use in real-world clinical workflows.

摘要： 牙科图像分析在支持口腔医疗的精准诊断和治疗规划中发挥着关键作用。尽管近期的进展已经产生了一些针对特定任务和单一成像模态的牙科 AI 模型，但它们孤立的设计限制了其在现实临床工作流程中的实际应用。

In this paper, we present OralAgent, the first dental-specialized AI agent that unifies multimodal reasoning, tool-based decision-making, and knowledge-grounded retrieval within an end-to-end automated framework. It integrates 22 visual analysis tools and 368 widely-used classical dental textbooks, enabling autonomous reasoning, planning, tool use, knowledge retrieval, and multi-step workflow execution.

在本文中，我们提出了 OralAgent，这是首个牙科专用 AI 智能体，它在一个端到端的自动化框架内统一了多模态推理、基于工具的决策制定以及基于知识的检索。该系统集成了 22 种视觉分析工具和 368 本广泛使用的经典牙科教科书，实现了自主推理、规划、工具使用、知识检索以及多步骤工作流程的执行。

Furthermore, we introduce OralCorpus, a large-scale, high-quality bilingual textual resource containing 134.8M tokens curated for dental retrieval-augmented generation (RAG). To evaluate models’ multidisciplinary dental knowledge, we construct OralQA-ZH, a Chinese multiple-choice question benchmark consisting of 798 items across eleven oral subspecialties.

此外，我们还引入了 OralCorpus，这是一个大规模、高质量的双语文本资源，包含 1.348 亿个 Token，专为牙科检索增强生成（RAG）而整理。为了评估模型的多学科牙科知识，我们构建了 OralQA-ZH，这是一个包含 798 道题目、涵盖十一个口腔亚专业的中文选择题基准测试集。

Extensive experiments demonstrate that OralAgent achieves state-of-the-art performance on the MMOral-Uni, MMOral-OPG, and OralQA-ZH benchmarks, highlighting its effectiveness, interpretability, and adaptability in real-world clinical settings. The code and models are publicly available at this https URL.

广泛的实验表明，OralAgent 在 MMOral-Uni、MMOral-OPG 和 OralQA-ZH 基准测试中均达到了最先进（SOTA）的性能，突显了其在现实临床环境中的有效性、可解释性和适应性。代码和模型已在相关链接中公开。