MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction

MARD：用于机制级药物相互作用预测的镜像增强推理蒸馏

Abstract: Mechanism-level drug-level drug-drug interaction (DDI) prediction requires identifying which enzyme or pharmacodynamic axis is implicated, in which direction, and with which evidence — not merely whether two drugs interact. We introduce a reproducible mechanism-level DDI labelling and evaluation protocol with a structured 7-family/147-subtype taxonomy, leakage-safe cold-split protocols, and auditable reasoning metrics for evaluating pharmacological prediction beyond flat interaction classification.

摘要： 机制级药物相互作用（DDI）预测不仅需要判断两种药物是否会产生相互作用，还需要识别涉及哪种酶或药效学轴、作用方向以及相关证据。我们引入了一套可复现的机制级 DDI 标注与评估协议，包含结构化的 7 个家族/147 个亚型的分类法、防泄漏的冷分割（cold-split）协议，以及用于评估超越简单交互分类的药理学预测的可审计推理指标。

We propose a pipeline that produces a 7B reasoning MARD (Mirror-Augmented Reasoning Distillation), combining three training innovations: a single-token KL divergence on direction tag that ties the model’s prediction, per-loss PRM-weighted DPO with programmatic hard negatives, and a leakage-safe mechanism-aware retrieval channel. Process-reward step labels are automatically verifiable against DrugBank-structured fields, requiring no human or LLM judges.

我们提出了一种能够生成 7B 参数推理模型 MARD（镜像增强推理蒸馏）的流水线，结合了三项训练创新：针对方向标签的单标记 KL 散度（用于约束模型预测）、基于损失的 PRM 加权 DPO（结合程序化硬负样本），以及防泄漏的机制感知检索通道。过程奖励（Process-reward）步骤标签可根据 DrugBank 的结构化字段自动验证，无需人工或大模型（LLM）评估。

On the April-2026 DrugBank release, our MARD-7B is the only system in a 32-system comparison whose accuracy survives drug-pair novelty, beating the best baseline by +13.9 pp and GPT-4o by +6.7 pp at ~1% of frontier API cost. Further analysis reveals an anti-memorisation signature where accuracy improves on rarely seen drugs, suggesting that gain comes from structured pharmacological reasoning rather than drug-frequency memorisation. We release corpus, DDI-PRM, retrieval index, and training code.

在 2026 年 4 月发布的 DrugBank 数据集上，MARD-7B 是 32 个对比系统中唯一在面对新药物对时仍能保持准确性的模型，其表现比最佳基准高出 13.9 个百分点，比 GPT-4o 高出 6.7 个百分点，而成本仅为前沿 API 的约 1%。进一步分析揭示了一种“反记忆”特征，即模型在罕见药物上的准确率反而有所提升，这表明其性能增益源于结构化的药理学推理，而非对药物出现频率的记忆。我们已开源语料库、DDI-PRM、检索索引及训练代码。