"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support"

OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support

OncoAgent:用于隐私保护肿瘤临床决策支持的双层多智能体框架

Abstract We present OncoAgent, an open-source, privacy-preserving clinical decision support system for oncology. OncoAgent combines a dual-tier fine-tuned LLM architecture with a state-of-the-art multi-agent LangGraph topology, a four-stage Corrective RAG pipeline over 70+ physician-grade NCCN and ESMO guidelines, and a three-layer reflexion safety validator enforcing a strict Zero-PHI policy.

摘要 我们推出了 OncoAgent,这是一个用于肿瘤学的开源、隐私保护型临床决策支持系统。OncoAgent 结合了双层微调大语言模型(LLM)架构与最先进的多智能体 LangGraph 拓扑结构,通过涵盖 70 多项医生级 NCCN 和 ESMO 指南的四阶段纠错检索增强生成(Corrective RAG)流水线,以及执行严格“零个人健康信息(Zero-PHI)”政策的三层反思安全验证器。

The system routes clinical queries through an additive complexity scorer to either a 9B parameter speed-optimised model (Tier 1) or a 27B deep-reasoning model (Tier 2), both fine-tuned via QLoRA on a corpus of 266,854 real and synthetically generated oncological cases using the Unsloth framework on AMD Instinct MI300X hardware (192 GB HBM3). Sequence packing on MI300X enabled full-dataset fine-tuning in approximately 50 minutes — a 56× throughput acceleration over API-based generation.

该系统通过加性复杂度评分器将临床查询引导至 9B 参数的速度优化模型(第一层)或 27B 参数的深度推理模型(第二层)。两者均通过 QLoRA 在 AMD Instinct MI300X 硬件(192 GB HBM3)上使用 Unsloth 框架,针对 266,854 个真实及合成的肿瘤病例语料库进行了微调。在 MI300X 上进行的序列打包技术使全数据集微调在约 50 分钟内完成,相比基于 API 的生成,吞吐量提升了 56 倍。

Post-fix, CRAG document grading achieved a 100% success rate with a mean RAG confidence score of 2.3+. The complete system is 100% open source and deployable on-premises, eliminating proprietary cloud API dependency and preserving patient data sovereignty.

修正后,CRAG 文档评分达到了 100% 的成功率,平均 RAG 置信度得分在 2.3 以上。整个系统 100% 开源且可本地部署,消除了对专有云 API 的依赖,并维护了患者数据主权。


1. Introduction

1. 引言

Oncology is one of the most information-dense and cognitively demanding domains in clinical medicine. The volume, heterogeneity, and rapid evolution of evidence-based guidelines — from the National Comprehensive Cancer Network (NCCN) to the European Society for Medical Oncology (ESMO) — create a persistent knowledge gap between published evidence and bedside practice.

肿瘤学是临床医学中信息最密集、认知要求最高的领域之一。从美国国家综合癌症网络(NCCN)到欧洲肿瘤内科学会(ESMO),循证指南的数量庞大、异构性强且更新迅速,这在已发表的证据与临床实践之间造成了持续的知识鸿沟。

AI-assisted clinical decision support systems hold transformative potential for closing this gap, yet most commercially available systems fail in three critical ways:

  1. Hallucinated recommendations not grounded in validated guidelines
  2. Cloud API dependency that precludes on-premises deployment in privacy-sensitive hospital environments
  3. Monolithic LLM architectures prone to context saturation under complex multi-comorbidity presentations

人工智能辅助临床决策支持系统在弥合这一鸿沟方面具有变革潜力,但大多数商业化系统在三个关键方面存在缺陷:

  1. 缺乏基于验证指南的幻觉建议;
  2. 对云 API 的依赖阻碍了在隐私敏感的医院环境中的本地部署;
  3. 单体 LLM 架构在处理复杂的多共病表现时容易出现上下文饱和。

OncoAgent is designed around three core principles:

  • Architectural decomposition: Clinical reasoning is decomposed across eight specialised LangGraph nodes, each with a bounded, auditable function.
  • Grounded generation: All model outputs are anchored to a curated vector knowledge base through a four-stage retrieval pipeline with explicit relevance gating.
  • Hardware sovereignty: The full inference and training stack runs natively on AMD Instinct MI300X using ROCm and open-source frameworks — enabling hospital deployment without data exfiltration.

OncoAgent 的设计围绕三个核心原则:

  • 架构解耦: 临床推理被分解到八个专门的 LangGraph 节点中,每个节点都具有受限且可审计的功能。
  • 扎根生成: 所有模型输出都通过一个包含明确相关性门控的四阶段检索流水线,锚定在经过整理的向量知识库上。
  • 硬件主权: 完整的推理和训练栈使用 ROCm 和开源框架在 AMD Instinct MI300X 上原生运行,从而实现无需数据外泄的医院部署。

2. 相关工作

2.1 Clinical LLMs and Decision Support Large language models have demonstrated significant promise in clinical NLP tasks including diagnostic coding, literature summarisation, and patient communication. Domain-specific fine-tuning approaches — exemplified by BioMedLM, Med-PaLM 2, and ClinicalBERT — consistently improve performance on medical benchmarks over general-purpose models. OncoAgent extends this line of work by targeting the specific subdomain of oncological triage and treatment pathway recommendation, where hallucination consequences are most severe.

2.1 临床大语言模型与决策支持 大语言模型在临床自然语言处理(NLP)任务中展现出巨大潜力,包括诊断编码、文献摘要和患者沟通。领域特定的微调方法(如 BioMedLM、Med-PaLM 2 和 ClinicalBERT)在医学基准测试中的表现始终优于通用模型。OncoAgent 扩展了这一研究方向,专注于肿瘤分诊和治疗路径推荐这一特定子领域,因为在这些领域中,幻觉带来的后果最为严重。

2.2 Multi-Agent Architectures Decomposed multi-agent systems have emerged as a principled approach to complex reasoning tasks. OncoAgent synthesises four canonical SOTA patterns:

  • Claude Code pattern — deterministic safety harnesses separated from LLM reasoning
  • Hermes Agent pattern — structured tool-calling with per-session memory isolation
  • Corrective RAG (Shi et al., 2024) — document relevance grading and query reformulation
  • Reflexion (Shinn et al., 2023) — self-correcting generation via feedback-augmented retry loops

2.2 多智能体架构 解耦的多智能体系统已成为处理复杂推理任务的一种原则性方法。OncoAgent 综合了四种经典的 SOTA 模式:

  • Claude Code 模式:将确定性安全防护与 LLM 推理分离;
  • Hermes Agent 模式:具有会话隔离内存的结构化工具调用;
  • 纠错 RAG(Shi 等人,2024):文档相关性评分与查询重构;
  • 反思(Shinn 等人,2023):通过反馈增强的重试循环进行自我修正生成。

2.3 Retrieval-Augmented Generation in Medicine Standard bi-encoder retrieval is ill-suited for clinical domains where terminological precision is critical (e.g., “tyrosine kinase inhibitor” vs. “TKI”). OncoAgent implements a multi-stage pipeline with cross-encoder re-ranking, and integrates Hypothetical Document Embeddings (HyDE; Gao et al., 2022) to resolve medical synonym mismatches by projecting natural language queries into the guideline embedding space.

2.3 医学中的检索增强生成 标准的双编码器检索不适用于术语精确度至关重要的临床领域(例如,“酪氨酸激酶抑制剂”与“TKI”)。OncoAgent 实现了一个带有交叉编码器重排序的多阶段流水线,并集成了假设文档嵌入(HyDE;Gao 等人,2022),通过将自然语言查询投影到指南嵌入空间来解决医学同义词匹配问题。