Counterargument for Critical Thinking as Judged by AI and Humans

AI 与人类视角下的批判性思维反驳论证研究

Abstract: This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). This is especially as risks of cheating and cognitive offloading exist with the use of GenAI. We presented 36 students in a particular university course with 4 carefully selected thesis statements (from a set of popular debates) to write about anyone of them. We used six established rubrics (focus, logic, content, style, correctness and reference) to conduct three human assessments (two student peer-reviews and one experienced teacher) per writeup on a 5-point Likert scale for all the qualified samples (n) of 35 submissions (after disqualifying one for irregularity).

摘要： 本项干预研究探讨了在生成式人工智能（GenAI）背景下，学生在批判性思维写作中使用反驳论证的情况。鉴于使用 GenAI 存在作弊和认知卸载的风险，这一点尤为重要。我们向某大学课程的 36 名学生提供了 4 个精心挑选的论题（选自一系列热门辩论话题），要求他们任选其一进行写作。我们采用了六项既定评估标准（重点、逻辑、内容、风格、准确性和参考文献），对 35 份合格的提交作品（剔除一份不合格作品后）进行了三次人工评估（两次学生互评和一次资深教师评估），并采用 5 分制李克特量表进行打分。

Using the same rubrics and guidelines, we also assessed the submissions using six frontier LLMs as judges. Our mixed-method design included qualitative open-ended feedback per assessment and quantitative methods. The results reveal that (1) the students’ self-written counterarguments to AI-generated content contains logic, among other things, which is a key component of critical thinking, and (2) GenAI can be successfully used at scale to assess students’ written work, based on clear rubrics, and these assessments generally align with human assessments as shown with Gwets AC2 inter-rater reliability values of 0.33 for all the models except one.

我们使用相同的评估标准和指南，利用六个前沿大语言模型（LLM）作为评审员对这些作品进行了评估。我们的混合方法设计包括了针对每次评估的定性开放式反馈以及定量分析方法。研究结果表明：（1）学生针对 AI 生成内容所撰写的反驳论证中包含了逻辑等要素，而这正是批判性思维的核心组成部分；（2）基于明确的评估标准，GenAI 可以成功地大规模用于评估学生的书面作业，且这些评估结果与人工评估基本一致，除一个模型外，其余模型的 Gwets AC2 评分者间信度值均为 0.33。