Which Models Perform Better in Inheritance Reasoning?

哪些模型在继承推理任务中表现更佳？

Abstract: This paper presents the participation of team PSL in the QIAS 2026 Shared Task on Arabic Islamic inheritance reasoning. The task evaluates the ability of large language models to solve inheritance cases that require legal interpretation, multi-step reasoning, and precise numerical computation. We compare commercial and open-source models under a unified prompting strategy to assess their effectiveness in structured legal reasoning with minimal task-specific adaptation.

摘要： 本文介绍了 PSL 团队参与 QIAS 2026 阿拉伯伊斯兰继承推理共享任务的情况。该任务旨在评估大型语言模型解决继承案例的能力，这些案例要求模型具备法律解释、多步推理以及精确数值计算的能力。我们通过统一的提示策略对比了“商业模型”与“开源模型”，以评估它们在几乎无需特定任务适配的情况下，进行结构化法律推理的有效性。

Our results show a clear gap in reliability between the two model families. Commercial models demonstrate stronger performance in identifying eligible heirs, applying exclusion rules, and maintaining consistency across reasoning steps. In contrast, open-source models exhibit greater instability, particularly in cases involving dependent legal decisions and fractional share adjustments. The best performance is achieved by Gemini 2.5 Flash, with an MRE of $0.989$.

我们的研究结果显示，这两类模型在可靠性方面存在明显差距。商业模型在识别合法继承人、应用排除规则以及保持推理步骤的一致性方面表现出更强的能力。相比之下，开源模型表现出更大的不稳定性，特别是在涉及从属法律决策和份额分数调整的案例中。表现最佳的模型是 Gemini 2.5 Flash，其 MRE（平均相对误差）达到了 0.989。