Evaluating Reasoning Models for Queries with Presuppositions

评估带有预设查询的推理模型

Abstract: Millions of users turn to AI models for their information needs. It is conceivable that a large number of user queries contain assumptions that may be factually inaccurate. Prior work notes that large language models (LLMs) often fail to challenge such erroneous assumptions, and can reinforce users’ misinformed opinions.

摘要： 数以百万计的用户转向人工智能模型以满足其信息需求。可以预见，大量的用户查询中包含可能在事实上不准确的假设。先前的研究指出，大型语言模型（LLMs）往往无法质疑这些错误的假设，反而可能强化用户的错误观点。

However, given the recent advances, especially in model’s reasoning capabilities, we revisit whether large reasoning models (LRMs) can reason about the underlying assumptions and respond to user queries appropriately. We construct queries with varying degrees of presuppositions spanning health, science, and general knowledge, and use it to evaluate several widely-deployed models.

然而，鉴于近期的进展，特别是在模型推理能力方面的提升，我们重新审视了大型推理模型（LRMs）是否能够对潜在假设进行推理，并适当地回应用户查询。我们构建了涵盖健康、科学和常识等领域、具有不同预设程度的查询，并利用这些查询评估了几个广泛部署的模型。

When compared to non-reasoning models, we find that reasoning models achieve a slightly higher accuracy (2-11%), but they still fail to challenge a large fraction (26-42%) of false presuppositions. Further, reasoning models remain susceptible to how strongly the presupposition is expressed.

与非推理模型相比，我们发现推理模型的准确率略有提高（2-11%），但它们仍然无法质疑很大一部分（26-42%）的错误预设。此外，推理模型对于预设表达的强烈程度仍然表现得较为敏感。

Paper Details:

Authors: Rose Sathyanathan, Kinshuk Vasisht, Danish Pruthi
arXiv ID: 2605.03050
Subject: Computation and Language (cs.CL)
Submission Date: 4 May 2026

论文详情：

作者： Rose Sathyanathan, Kinshuk Vasisht, Danish Pruthi
arXiv ID: 2605.03050
学科： 计算与语言 (cs.CL)
提交日期： 2026年5月4日