MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
MATH-PT:针对欧洲葡萄牙语和巴西葡萄牙语的数学推理基准测试
Abstract: The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English.
摘要: 利用大语言模型(LLM)进行复杂数学推理是一个新兴的研究领域,在方法、模型和基准数据集方面进展迅速。然而,大多数数学推理评估都表现出显著的语言偏见,绝大多数基准数据集仅以英语编写,或者(充其量)是从英语翻译而来的。
We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil.
为了解决这一局限性,我们推出了 {\sc Math-PT},这是一个包含 1,729 道以欧洲葡萄牙语和巴西葡萄牙语编写的数学题的新型数据集。{\sc Math-PT} 精选自多种高质量的母语来源,包括葡萄牙和巴西的数学奥林匹克竞赛、各类比赛及考试题目。
We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.
我们对当前最先进的 LLM 在 {\sc Math-PT} 上的表现进行了全面基准测试。结果显示,与开源权重模型相比,前沿推理模型在选择题上表现强劲,但在涉及图形的题目或开放式问题上,其性能有所下降。为了促进未来的研究,我们公开了该基准数据集及模型输出结果。
Paper Details:
- Authors: Tiago Teixeira, Ana Carolina Erthal, Juan Belieni, Beatriz Canaverde, Diego Mesquita, Miguel Faria, Eliezer de Souza da Silva, André F. T. Martins
- Submitted: 1 Apr 2026
- Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
论文详情:
- 作者: Tiago Teixeira, Ana Carolina Erthal, Juan Belieni, Beatriz Canaverde, Diego Mesquita, Miguel Faria, Eliezer de Souza da Silva, André F. T. Martins
- 提交日期: 2026 年 4 月 1 日
- 学科分类: 计算与语言 (cs.CL);信息检索 (cs.IR)