ModTGCN: Modularity-aware Graph Neural Networks for Text Classification

ModTGCN：用于文本分类的模块化感知图神经网络

Abstract: Graph-based text classification models typically rely on local neighborhood aggregation and overlook global community structure, despite semantic document graphs exhibiting strong class-consistent clustering. Ignoring this can blur class boundaries and lead to over-smoothing.

摘要： 基于图的文本分类模型通常依赖于局部邻域聚合，而忽略了全局社区结构，尽管语义文档图表现出很强的类一致性聚类特征。忽视这一点可能会模糊类边界并导致过度平滑（over-smoothing）。

We propose ModTGCN, a modularity-aware graph neural network for text classification that jointly optimizes cross-entropy and a modularity-based auxiliary objective to promote class-coherent document communities while preserving discriminative representations.

我们提出了 ModTGCN，这是一种用于文本分类的模块化感知图神经网络。它通过联合优化交叉熵和基于模块化的辅助目标，在保持判别性表示的同时，促进类一致的文档社区形成。

The modularity term is computed on a document-document similarity graph derived from transformer embeddings (pretrained or fine-tuned). To improve scalability, we decouple the original heterogeneous TextGCN graph into separate document-word and word-word components, achieving 2x-10x faster training.

模块化项是在从 Transformer 嵌入（预训练或微调）导出的文档-文档相似度图上计算的。为了提高可扩展性，我们将原始的异构 TextGCN 图解耦为独立的文档-词和词-词组件，从而实现了 2 倍到 10 倍的训练加速。

We further study graph construction strategies, label-aware edge reweighting, and supervision choices for modularity optimization. Experiments on five benchmarks show consistent gains, with larger improvements on complex, low homophily datasets such as Ohsumed and 20NG.

我们进一步研究了图构建策略、标签感知边重加权以及模块化优化的监督选择。在五个基准测试上的实验显示了持续的性能提升，特别是在 Ohsumed 和 20NG 等复杂、低同质性数据集上，改进效果更为显著。