MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

MER-R1：通过慢思考与快思考协同实现多模态情感推理

Abstract: We find that explicit reasoning does not necessarily translate into better multimodal emotion recognition (MER) accuracy, even though it makes predictions more interpretable. Specifically, for reasoning-based MLLMs, fast thinking by triggering direct answers often outperforms slow thinking after deliberative reasoning. Our empirical analyses show that fast thinking improves recall with broader and more confident predictions, whereas slow thinking favors precision through conservative filtering of incorrect categories.

摘要： 我们发现，显式推理并不一定能转化为更好的多模态情感识别（MER）准确率，尽管它使预测更具可解释性。具体而言，对于基于推理的多模态大语言模型（MLLM）来说，通过触发直接答案的“快思考”往往优于经过深思熟虑后的“慢思考”。我们的实证分析表明，快思考通过更广泛、更自信的预测提高了召回率，而慢思考则通过对错误类别的保守过滤来提升精确度。

Building on these insights, we propose MER-R1, a reinforcement learning framework that turns slow-fast complementarity into explicit optimization. Dual-objective disentanglement separates recall and precision into two optimization signals, allowing them to be jointly optimized rather than traded off against each other. Slow-fast confidence calibration further aligns the final slow-thinking answer with fast-thinking intuition, strengthening correct emotions while suppressing incorrect ones.

基于这些见解，我们提出了 MER-R1，这是一个将“慢-快”互补性转化为显式优化的强化学习框架。双目标解耦将召回率和精确度分离为两个优化信号，使它们能够被联合优化，而不是相互权衡。慢-快置信度校准进一步将最终的慢思考答案与快思考直觉对齐，在增强正确情感的同时抑制错误情感。

In this way, MER-R1 unifies the recall-oriented intuition of fast thinking with the precision-oriented selectivity of slow thinking. We further provide theoretical justification for this synergy, showing that it mitigates variance-induced interference during optimization. Extensive experiments on MER-UniBench and MME-Emotion show that MER-R1 achieves state-of-the-art performance and makes reasoning genuinely benefit emotion recognition.

通过这种方式，MER-R1 将快思考中以召回为导向的直觉与慢思考中以精确为导向的选择性统一起来。我们进一步为这种协同作用提供了理论依据，证明它减轻了优化过程中由方差引起的干扰。在 MER-UniBench 和 MME-Emotion 上的大量实验表明，MER-R1 达到了最先进的性能，并使推理真正造福于情感识别。