Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

基于信息融合的文档分类模式识别：多模态与多视图表示方法的系统综述

Abstract: Information fusion is used widely to improve document classification by the integration of multiple data sources (multimodal) or representations (multiview). However, the field lacks a unified framework, a quantitative synthesis of its effectiveness, and clear guidance for practitioners. This systematic review addresses these gaps by analysing 139 primary studies.

摘要： 信息融合被广泛用于通过整合多个数据源（多模态）或表示形式（多视图）来改进文档分类。然而，该领域目前缺乏统一的框架、对其有效性的定量综合分析，以及对从业者的明确指导。本系统综述通过分析 139 项主要研究，旨在填补这些空白。

It introduces a formal framework to structure the field, presents the results of a qualitative analysis to identify key trends, and performs a random-effects meta-analysis (to our knowledge, the first focused on document classification) to quantify performance gains. Our meta-analysis reveals that multimodal fusion improves accuracy (mean gain of +5.28 percentage points, $p=0.0016$) significantly — the F1-score effect is directionally positive but statistically non-significant in our primary model.

该综述引入了一个正式框架来构建该领域，通过定性分析展示了关键趋势，并进行了随机效应荟萃分析（据我们所知，这是首个针对文档分类的此类分析）以量化性能提升。我们的荟萃分析显示，多模态融合显著提高了准确率（平均提升 +5.28 个百分点，$p=0.0016$）——在我们的主要模型中，F1 分数的效果在方向上是积极的，但在统计学上并不显著。

Multiview fusion provides consistent but modest gains for accuracy (+4.67%), F1-score (+3.08%), and recall (all $p<0.05$). Critically, our qualitative synthesis uncovers challenges in reproducibility in methodological rigour: only 11.8% (multimodal) and 23.3% (multiview) of the studies use statistical tests to validate their findings, which undermines the reliability of many of their results.

多视图融合在准确率（+4.67%）、F1 分数（+3.08%）和召回率（均 $p<0.05$）方面提供了持续但适度的提升。至关重要的是，我们的定性综合分析揭示了方法论严谨性在可重复性方面的挑战：仅有 11.8%（多模态）和 23.3%（多视图）的研究使用统计检验来验证其发现，这削弱了许多研究结果的可靠性。

This review’s primary contributions are a unifying framework, the first quantitative evidence base, and data-driven guidelines. This review concludes that successful information fusion depends not on algorithmic complexity, but on the strategic alignment of the fusion method with the task context and a commitment to more rigorous validation.

本综述的主要贡献在于提供了一个统一的框架、首个定量证据库以及数据驱动的指南。综述结论指出，成功的信息融合并不取决于算法的复杂性，而取决于融合方法与任务背景的战略性对齐，以及对更严谨验证过程的承诺。