REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer’s Disease Risk

REVEAL++：用于阿尔茨海默病风险视网膜视觉-语言建模的可微表型分组

Abstract: The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer’s disease (AD).

摘要： 视网膜为观察神经退行性疾病提供了一个非侵入性的窗口，能够捕捉到与未来认知能力下降风险相关的细微结构模式。诸如 REVEAL 之类的视觉-语言对齐框架已经证明，将视网膜眼底图像与结构化的临床风险叙述相结合，可以改善对阿尔茨海默病 (AD) 的早期预测。

A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning.

这些方法的一个关键设计选择是使用表型分组，即在对比学习过程中，将具有相似风险特征的个体视为多正样本对。然而，现有方法将表型相似性视为一种离散结构，依赖于硬分组分配，这不仅施加了僵化的监督，还导致了分组形成与表征学习的脱节。

We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk.

我们提出了一种在对比学习中对表型结构进行连续建模的方法。我们不再将样本分配到固定的聚类中，而是将受试者间的相似性建模为一个可微的加权函数，该函数源自视网膜图像和风险概况在模态内部的嵌入相似性。这些权重通过连续聚合算子定义了软多正样本关系，从而实现了能够反映疾病风险谱系性质的分级监督。

We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines.

我们进一步引入了一种软目标对比目标函数，以端到端的方式联合学习跨模态对齐和表型结构。在英国生物样本库 (UK Biobank) 的 AD 发病预测视网膜成像数据上的评估表明，该框架的表现始终优于基于离散分组的对比学习和标准的视觉-语言基线模型。

By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.

通过将表型相似性视为一种可学习的连续信号，而非固定的分组规则，我们的方法为基于多模态视网膜和临床数据进行群体规模的神经退行性风险建模提供了一个有原则且稳健的基础。