MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction

MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction

MARD:用于机制级药物相互作用预测的镜像增强推理蒸馏


Abstract: Mechanism-level drug-level drug-drug interaction (DDI) prediction requires identifying which enzyme or pharmacodynamic axis is implicated, in which direction, and with which evidence — not merely whether two drugs interact. We introduce a reproducible mechanism-level DDI labelling and evaluation protocol with a structured 7-family/147-subtype taxonomy, leakage-safe cold-split protocols, and auditable reasoning metrics for evaluating pharmacological prediction beyond flat interaction classification.

摘要: 机制级药物相互作用(DDI)预测不仅需要判断两种药物是否会产生相互作用,还需要识别涉及哪种酶或药效学轴、作用方向以及相关证据。我们引入了一套可复现的机制级 DDI 标注与评估协议,包含结构化的 7 个家族/147 个亚型的分类法、防泄漏的冷分割(cold-split)协议,以及用于评估超越简单交互分类的药理学预测的可审计推理指标。


We propose a pipeline that produces a 7B reasoning MARD (Mirror-Augmented Reasoning Distillation), combining three training innovations: a single-token KL divergence on direction tag that ties the model’s prediction, per-loss PRM-weighted DPO with programmatic hard negatives, and a leakage-safe mechanism-aware retrieval channel. Process-reward step labels are automatically verifiable against DrugBank-structured fields, requiring no human or LLM judges.

我们提出了一种能够生成 7B 参数推理模型 MARD(镜像增强推理蒸馏)的流水线,结合了三项训练创新:针对方向标签的单标记 KL 散度(用于约束模型预测)、基于损失的 PRM 加权 DPO(结合程序化硬负样本),以及防泄漏的机制感知检索通道。过程奖励(Process-reward)步骤标签可根据 DrugBank 的结构化字段自动验证,无需人工或大模型(LLM)评估。


On the April-2026 DrugBank release, our MARD-7B is the only system in a 32-system comparison whose accuracy survives drug-pair novelty, beating the best baseline by +13.9 pp and GPT-4o by +6.7 pp at ~1% of frontier API cost. Further analysis reveals an anti-memorisation signature where accuracy improves on rarely seen drugs, suggesting that gain comes from structured pharmacological reasoning rather than drug-frequency memorisation. We release corpus, DDI-PRM, retrieval index, and training code.

在 2026 年 4 月发布的 DrugBank 数据集上,MARD-7B 是 32 个对比系统中唯一在面对新药物对时仍能保持准确性的模型,其表现比最佳基准高出 13.9 个百分点,比 GPT-4o 高出 6.7 个百分点,而成本仅为前沿 API 的约 1%。进一步分析揭示了一种“反记忆”特征,即模型在罕见药物上的准确率反而有所提升,这表明其性能增益源于结构化的药理学推理,而非对药物出现频率的记忆。我们已开源语料库、DDI-PRM、检索索引及训练代码。