Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

使用基于 RAG 的跨模型多数投票工作流评估 ChatGPT 在生物医学关联生成与验证中的协议

Abstract: We present a protocol to evaluate ChatGPT’s ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.

摘要： 我们提出了一项评估 ChatGPT 生成以疾病为中心的生物医学关联能力的协议。该协议概述了我们如何生成这些关联、利用生物医学本体验证生物实体，以及通过文献验证关联。该协议包含一种自洽性策略，用于评估 ChatGPT 模型在生成过程中的可靠性。为了解决本体精确匹配的局限性，我们提供了一个用例，通过由开源大语言模型（LLM）驱动的检索增强生成（RAG）工作流来执行语义验证。这使得大语言模型能够对其他模型生成的内容进行事实核查，并揭示其中的幻觉现象。

Publication Details:

Authors: Ahmed Abdeen Hamed, Luis M. Rocha
Journal Reference: STAR Protocols, 2026; 7
DOI: 10.1016/j.xpro.2026.104533
arXiv: 2605.30400

出版详情：

作者： Ahmed Abdeen Hamed, Luis M. Rocha
期刊参考： STAR Protocols, 2026; 7
DOI： 10.1016/j.xpro.2026.104533
arXiv： 2605.30400