Analysing Lightweight Large Language Models for Biomedical Named Entity Recognition on Diverse Ouput Formats

Analysing Lightweight Large Language Models for Biomedical Named Entity Recognition on Diverse Output Formats

分析用于生物医学命名实体识别的轻量级大语言模型在不同输出格式下的表现

Abstract: Despite their strong linguistic capabilities, Large Language Models (LLMs) are computationally demanding and require substantial resources for fine-tuning, which is unadapted to privacy and budget constraints of many healthcare settings.

摘要： 尽管大语言模型（LLMs）具备强大的语言能力，但它们对计算资源要求极高，且微调过程需要大量资源，这使得它们难以适应许多医疗环境中的隐私和预算限制。

To address this, we present an experimental analysis focused on Biomedical Named Entity Recognition using lightweight LLMs, we evaluate the impact of different output formats on model performance.

为了解决这一问题，我们针对使用轻量级大语言模型进行生物医学命名实体识别（Biomedical Named Entity Recognition）开展了实验分析，并评估了不同输出格式对模型性能的影响。

The results reveal that lightweight LLMs can achieve competitive performance compared to the larger models, highlighting their potential as lightweight yet effective alternatives for biomedical information extraction.

研究结果表明，轻量级大语言模型能够达到与大型模型相媲美的性能，凸显了它们作为生物医学信息提取领域中轻量且高效替代方案的潜力。

Our analysis shows that instruction tuning over many distinct formats does not improve performance, but identifies several formats consistently associated with better performance.

我们的分析显示，针对多种不同格式进行指令微调并不能提升性能，但我们识别出了几种始终能带来更好表现的特定格式。

Paper Details:

Authors: Pierre Epron (HeKA | U1346, DIG), Adrien Coulet (HeKA | U1346), Mehwish Alam (IP Paris, DIG)
arXiv ID: 2604.25920
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Submission Date: 27 Mar 2026

论文详情：

作者： Pierre Epron (HeKA | U1346, DIG), Adrien Coulet (HeKA | U1346), Mehwish Alam (IP Paris, DIG)
arXiv ID: 2604.25920
学科分类： 计算与语言 (cs.CL)；人工智能 (cs.AI)
提交日期： 2026年3月27日