ModTGCN: Modularity-aware Graph Neural Networks for Text Classification
ModTGCN: Modularity-aware Graph Neural Networks for Text Classification
ModTGCN:用于文本分类的模块化感知图神经网络
Abstract: Graph-based text classification models typically rely on local neighborhood aggregation and overlook global community structure, despite semantic document graphs exhibiting strong class-consistent clustering. Ignoring this can blur class boundaries and lead to over-smoothing.
摘要: 基于图的文本分类模型通常依赖于局部邻域聚合,而忽略了全局社区结构,尽管语义文档图表现出很强的类一致性聚类特征。忽视这一点可能会模糊类边界并导致过度平滑(over-smoothing)。
We propose ModTGCN, a modularity-aware graph neural network for text classification that jointly optimizes cross-entropy and a modularity-based auxiliary objective to promote class-coherent document communities while preserving discriminative representations.
我们提出了 ModTGCN,这是一种用于文本分类的模块化感知图神经网络。它通过联合优化交叉熵和基于模块化的辅助目标,在保持判别性表示的同时,促进类一致的文档社区形成。
The modularity term is computed on a document-document similarity graph derived from transformer embeddings (pretrained or fine-tuned). To improve scalability, we decouple the original heterogeneous TextGCN graph into separate document-word and word-word components, achieving 2x-10x faster training.
模块化项是在从 Transformer 嵌入(预训练或微调)导出的文档-文档相似度图上计算的。为了提高可扩展性,我们将原始的异构 TextGCN 图解耦为独立的文档-词和词-词组件,从而实现了 2 倍到 10 倍的训练加速。
We further study graph construction strategies, label-aware edge reweighting, and supervision choices for modularity optimization. Experiments on five benchmarks show consistent gains, with larger improvements on complex, low homophily datasets such as Ohsumed and 20NG.
我们进一步研究了图构建策略、标签感知边重加权以及模块化优化的监督选择。在五个基准测试上的实验显示了持续的性能提升,特别是在 Ohsumed 和 20NG 等复杂、低同质性数据集上,改进效果更为显著。