QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

QIAS 2026:伊斯兰继承推理共享任务概述

Abstract: This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance. 摘要: 本文全面概述了 QIAS 2026 共享任务,该任务是 OSACT7 研讨会的一部分,并与 LREC 2026 同期举行。该共享任务旨在评估大型语言模型在伊斯兰继承这一宗教与法律领域进行复杂推理的能力。

Unlike conventional question-answering benchmarks, QIAS 2026 focuses on end-to-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary. 与传统的问答基准测试不同,QIAS 2026 专注于从自然语言案例出发的端到端推理,要求系统执行完整的继承计算过程,从识别合格继承人到为每位受益人分配正确的份额。

To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of 12,500 Arabic inheritance cases annotated with intermediate reasoning steps and final answers. System submissions were evaluated using MIR-E, a multi-step metric that measures performance across the main stages of inheritance reasoning. 为了支持此项评估,该任务基于 MAWARITH 基准测试,这是一个包含 12,500 个阿拉伯语继承案例的数据集,并标注了中间推理步骤和最终答案。系统提交的结果使用 MIR-E 进行评估,这是一种衡量继承推理各主要阶段表现的多步骤指标。

A total of 16 teams participated in the shared task, investigating a range of approaches, including prompting-based methods, retrieval-augmented generation, and fine-tuning strategies. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results. 共有 16 支团队参加了此次共享任务,研究了多种方法,包括基于提示的方法、检索增强生成(RAG)和微调策略。结果表明,对于当前的语言模型而言,伊斯兰继承仍然是一个极具挑战性的基准测试,特别是在需要精确法律解释和结构化数值推理的阶段。本概述总结了任务设计、数据集、评估框架、参与系统及主要结果。