AI outperforms law professors in Stanford Law study
AI outperforms law professors in Stanford Law study
斯坦福法学院研究:AI 在法律教学表现上超越法学教授
A groundbreaking study led by Stanford Law School Professor Julian Nyarko reveals that law professors overwhelmingly prefer AI-generated answers to student questions over responses written by their fellow instructors—a finding that could reshape how legal education is delivered. The study, titled “Law Professors Prefer AI Over Peer Answers,” was conducted with 16 law professors across U.S. law schools and tested whether large language models could serve as effective tutors for contract law courses.
由斯坦福法学院教授朱利安·尼亚科(Julian Nyarko)领导的一项开创性研究显示,法学教授们在回答学生问题时,压倒性地更倾向于 AI 生成的答案,而非同行教师撰写的回复——这一发现可能会重塑法律教育的交付方式。这项题为《法学教授更青睐 AI 而非同行答案》(Law Professors Prefer AI Over Peer Answers)的研究,邀请了来自美国各法学院的 16 位法学教授参与,旨在测试大型语言模型是否能作为合同法课程的有效导师。
In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups. “This study challenges important assumptions about AI’s role in legal education,” said Nyarko, who leads Stanford Law School’s Legal Innovation through Frontier Technology Lab, or liftlab. He co-authored the paper with colleagues from Yale, NYU, University of Chicago, and other leading institutions. “We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity—not just factual recall.”
在对近 3,000 组匿名对比进行的盲测评估中,教授们对 AI 回复的评分显著高于其他教授撰写的答案,AI 在 75% 的正面交锋中胜出。“这项研究挑战了关于 AI 在法律教育中作用的重要假设,”领导斯坦福法学院“前沿技术法律创新实验室”(liftlab)的尼亚科表示。他与来自耶鲁大学、纽约大学、芝加哥大学及其他顶尖学府的同事共同撰写了这篇论文。“我们之所以专注于法律领域,正是因为它需要判断力、细致的推理能力以及处理模糊性的能力,而不仅仅是事实记忆。”
Can LLMs Reason? The study is particularly notable because previous AI evaluations have focused primarily on subjects with clear right-or-wrong answers. Legal reasoning, by contrast, demands careful analysis of competing arguments and defensible conclusions.
大语言模型能推理吗?这项研究之所以特别引人注目,是因为以往的 AI 评估主要集中在有明确对错答案的学科上。相比之下,法律推理要求对相互竞争的论点和可辩护的结论进行仔细分析。
“We were frankly surprised by the magnitude of the results,” Nyarko added. “These weren’t just simple questions with obvious answers. Many of them required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop their own analytical skills.”
“坦率地说,我们对结果的量级感到惊讶,”尼亚科补充道。“这些不仅仅是答案显而易见的简单问题。其中许多问题需要综合复杂的材料,将其应用于新情境,并以有助于学生培养自身分析能力的方式解释法律概念。”
Participants created 40 representative contracts law questions that students might ask after class or during office hours, wrote their own answers, and then evaluated responses without knowing whether they came from AI or other participating professors. The AI systems performed comparably to the best human instructor in the study. Perhaps most striking: professors flagged AI responses as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers.
参与者创建了 40 个具有代表性的合同法问题(学生可能会在课后或办公时间提出),撰写了自己的答案,然后在不知道答案来源(AI 或其他参与教授)的情况下对回复进行了评估。AI 系统在研究中的表现与最优秀的教师相当。最令人震惊的是:教授们认为 AI 回复在教学上有害的比例仅为 3.5%,而同行撰写的答案这一比例为 12%。
“In most fields where AI gets tested, there’s a right answer. In law, there often isn’t.” said Sarath Sanga, co-author and professor at Yale Law School. “Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other’s arguments. In this case, the answer was yes.”
“在大多数测试 AI 的领域,都有一个标准答案。但在法律领域,往往没有,”合著者、耶鲁法学院教授萨拉特·桑加(Sarath Sanga)说。“两个对立的论点可能都是合理的。我们想知道的是,AI 是否能达到律师们评估彼此论点时所使用的潜在专业标准。在这种情况下,答案是肯定的。”
The research team took extensive precautions to ensure the study’s validity. They calibrated AI responses to match the length and structure of human answers, used multiple evaluation methods, and had professors assess whether responses might mislead or confuse students.
研究团队采取了广泛的预防措施以确保研究的有效性。他们校准了 AI 回复,使其在长度和结构上与人类答案相匹配,使用了多种评估方法,并让教授们评估回复是否可能误导或困扰学生。
Transforming Legal Education “We designed this study to be as rigorous as possible because the stakes are so high,” Nyarko explained. “Legal education is about training future lawyers to think critically, argue persuasively, and navigate ethical complexities. Our study makes important steps towards finding out whether AI could support that mission.”
变革法律教育 “我们设计这项研究时尽可能严谨,因为利害关系太大了,”尼亚科解释道。“法律教育旨在培养未来的律师进行批判性思考、有说服力地辩论,并应对复杂的伦理问题。我们的研究在探索 AI 是否能支持这一使命方面迈出了重要的一步。”
Alejandro Salinas, first author of the study and a researcher at Nyarko’s liftlab, emphasized the educational implications: “Our study shifts attention to what AI tutoring can contribute to learning in judgment-rich fields like law. We find that, when evaluated by legal educators, AI tutors can offer high-quality, on-demand support that complements classroom instruction, and may broaden access to expert guidance.”
该研究的第一作者、尼亚科 liftlab 的研究员亚历杭德罗·萨利纳斯(Alejandro Salinas)强调了其教育意义:“我们的研究将注意力转向了 AI 辅导在法律等需要丰富判断力的领域能为学习做出什么贡献。我们发现,在法律教育者的评估下,AI 导师可以提供高质量、按需的支持,作为课堂教学的补充,并可能扩大获取专家指导的途径。”
The study also examined specific AI models, including commercial tutoring systems and Google’s NotebookLM, finding varying levels of performance. However, even when context limitations affected AI responses, professors still frequently preferred them to human-written alternatives.
该研究还考察了特定的 AI 模型,包括商业辅导系统和谷歌的 NotebookLM,发现其表现水平各异。然而,即使在上下文限制影响 AI 回复的情况下,教授们仍然经常倾向于选择 AI 而非人类撰写的替代方案。
The findings arrive as law schools nationwide grapple with integrating AI tools into legal education while maintaining rigorous academic standards. Some institutions have embraced AI experimentation, while others remain cautious about potential risks including hallucinations, overreliance, and the erosion of critical thinking skills.
这些发现出炉之际,全国各地的法学院正努力在保持严格学术标准的同时,将 AI 工具整合到法律教育中。一些机构已经拥抱了 AI 实验,而另一些机构则对潜在风险(包括幻觉、过度依赖以及批判性思维能力的削弱)保持谨慎。
“Our study evaluates the quality of answers given by AI tools. But how to implement these tools to most effectively improve student learning is still an open question. So we’re not advocating for wholesale adoption of AI tutors,” Nyarko cautioned. “But our data suggests that blanket skepticism may be equally unwarranted. The conversation should shift from whether AI can give accurate, high quality responses to how we can deploy it responsibly to the benefit of our students.”
“我们的研究评估了 AI 工具所给答案的质量。但如何实施这些工具以最有效地改善学生学习仍是一个悬而未决的问题。因此,我们并不主张全面采用 AI 导师,”尼亚科提醒道。“但我们的数据表明,一概而论的怀疑态度可能同样是不合理的。讨论的重点应该从‘AI 能否给出准确、高质量的回复’转向‘我们如何负责任地部署它以造福学生’。”