ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

ESICA：一种用于文本引导 3D 医学图像分割的可扩展框架

Abstract: Text guided 3D medical image segmentation offers a flexible alternative to class based and spatial prompt based models by allowing users to specify regions of interest directly in natural language. This paradigm avoids reliance on predefined label sets, reduces ambiguous outputs, and aligns more naturally with clinical workflows. 摘要： 文本引导的 3D 医学图像分割为基于类别和基于空间提示的模型提供了一种灵活的替代方案，允许用户直接使用自然语言指定感兴趣区域。这种范式避免了对预定义标签集的依赖，减少了模糊输出，并与临床工作流程更加自然地契合。

However, existing text guided frameworks are often computationally expensive, exhibit weak text volume feature alignment, and fail to capture fine anatomical details. We propose ESICA, a lightweight and scalable framework that addresses these challenges through three innovations: (1) a similarity matrix based mask prediction formulation that enhances semantic alignment, (2) an efficient decomposed decoder with adapter modules for accurate volumetric decoding, and (3) a two pass refinement strategy that sharpens boundaries and resolves uncertain regions. 然而，现有的文本引导框架通常计算成本高昂，表现出较弱的文本-体积特征对齐能力，且难以捕捉精细的解剖细节。我们提出了 ESICA，这是一个轻量级且可扩展的框架，通过三项创新解决了这些挑战：(1) 基于相似度矩阵的掩码预测公式，增强了语义对齐；(2) 带有适配器模块的高效分解解码器，用于精确的体积解码；(3) 两阶段细化策略，用于锐化边界并解决不确定区域。

To improve training stability and generalization, ESICA adopts a two stage scheme consisting of positive only pretraining followed by balanced fine tuning. On the CVPR BiomedSegFM benchmark spanning five imaging modalities (CT, MRI, PET, ultrasound, and microscopy), ESICA achieves state of the art segmentation accuracy, while the compact ESICA4 Lite variant attains similar segmentation performance with substantially fewer parameters, yielding a superior efficiency accuracy trade off. 为了提高训练稳定性和泛化能力，ESICA 采用了两阶段方案，包括仅正样本预训练和随后的平衡微调。在涵盖五种成像模态（CT、MRI、PET、超声和显微镜）的 CVPR BiomedSegFM 基准测试中，ESICA 实现了最先进的分割精度，而紧凑的 ESICA4 Lite 变体在参数大幅减少的情况下仍能达到相似的分割性能，从而在效率与精度之间取得了更优的平衡。

Our framework advances text guided segmentation toward efficient, scalable, and clinically deployable systems. Code will be made publicly available at this https URL. 我们的框架推动了文本引导分割向高效、可扩展且可临床部署的系统发展。代码将在该链接公开提供。