Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

因果关联：利用多语言微调进行金融问答 @ FinCausal 2026

Abstract: This paper describes team HSA_CORAL’s submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish.

摘要： 本文介绍了 HSA_CORAL 团队在 FinCausal 2026 共享任务中的参赛方案，该任务旨在通过英语和西班牙语的抽取式问答，从金融叙述中提取因果关系。

We compare three modeling families: (i) encoder-only token tagging with multilingual BERT, (ii) encoder-decoder generation with multilingual BART, and (iii) decoder-only LLMs (Llama 3.1 and GPT variants) using prompt refinement, few-shot demonstrations, and supervised fine-tuning.

我们比较了三个模型系列：(i) 使用多语言 BERT 的仅编码器（encoder-only）标记模型；(ii) 使用多语言 BART 的编码器-解码器（encoder-decoder）生成模型；以及 (iii) 使用提示词优化、少样本演示和监督微调的仅解码器（decoder-only）大语言模型（Llama 3.1 和 GPT 变体）。

Across settings, prompting and few-shot examples yield competitive performance, while supervised fine-tuning provides the largest gains.

在各种设置下，提示词工程和少样本示例均能产生具有竞争力的表现，而监督微调则带来了最大的性能提升。

Our best system, GPT-4.1 Mini fine-tuned on combined English and Spanish training data, achieves a tied highest score on the English subtask (score 4.8140) and ranks third on Spanish (score 4.7753) under the shared task’s LLM-as-a-judge metric.

我们表现最好的系统是基于英语和西班牙语混合训练数据微调的 GPT-4.1 Mini，该系统在共享任务的“大模型作为裁判”（LLM-as-a-judge）指标下，在英语子任务中并列第一（得分 4.8140），在西班牙语子任务中排名第三（得分 4.7753）。

Overall, the results highlight the value of task-specific adaptation and multilingual fine-tuning for cross-lingual transfer in financial causality QA.

总体而言，研究结果突显了任务特定适配和多语言微调在金融因果问答跨语言迁移中的价值。