Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

解构跨语言迁移中的语言相关性与任务对齐

Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B—671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls.

摘要： 我们通过在阿拉伯语上微调七个大型语言模型（参数量在 4B 到 671B 之间），并对闪米特语系语言及非闪米特语系对照组进行零样本阅读理解评估，从而研究跨语言迁移。

Across dense and Mixture-of-Experts architectures, we find no evidence of Semitic-specific transfer: models with weak baselines improve dramatically across all languages, while strong-baseline models show only marginal gains regardless of language family.

在稠密模型和混合专家（MoE）架构中，我们均未发现闪米特语系特有的迁移证据：基线性能较弱的模型在所有语言上的表现都有显著提升，而基线性能较强的模型无论属于哪个语系，其增益都微乎其微。

A chain-of-thought ablation reinforces this finding — the same models that benefit most from fine-tuning benefit equally from inference-time reasoning, suggesting both mechanisms address task-format alignment rather than cross-lingual knowledge transfer.

思维链（Chain-of-Thought）消融实验进一步证实了这一发现——那些从微调中获益最多的模型，在推理时段的思维链引导下同样获益，这表明这两种机制解决的都是任务格式对齐问题，而非跨语言知识迁移。