One Year Later...The Harms Persist, But So Do We!

One Year Later…The Harms Persist, But So Do We!

一年过去了……危害依然存在,但我们依然坚持!

Abstract: General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate and inconsistent across clinical conditions. This study evaluates six proprietary LLMs across 16 DSM-5 conditions using four adversarial attack variants, introducing an eight-dimension harm taxonomy and a multi-dimensional evaluation framework.

摘要: 通用大语言模型(LLMs)正越来越多地被用于心理健康相关的对话,然而其安全防护措施在不同临床病症中仍然显得不足且不一致。本研究通过四种对抗性攻击变体,评估了六种专有大语言模型在 16 种 DSM-5(精神疾病诊断与统计手册第 5 版)病症下的表现,并引入了一个八维度的危害分类法及多维评估框架。

Results show that safeguards hold reliably only for suicide and self-harm, while conditions such as eating disorders, substance use disorder, and major depressive disorder exhibit failure rates of up to 100%. We argue that ethical design and deployment of these LLMs demand clearly defined harm categories across clinical conditions and implementation of safeguards accordingly.

研究结果显示,仅在自杀和自残相关问题上,安全防护措施表现可靠;而在进食障碍、物质使用障碍和重度抑郁症等病症中,防护失败率高达 100%。我们认为,这些大语言模型的伦理设计与部署,要求针对不同临床病症明确界定危害类别,并据此实施相应的安全防护措施。

Until such safeguards are in place, these models pose significant risks to vulnerable populations, making their growing integration into educational settings a particularly concerning.

在这些防护措施到位之前,这些模型对弱势群体构成了重大风险,这使得它们在教育环境中的日益普及变得尤为令人担忧。


Paper Details:

  • Authors: Annika Marie Schoene, Cansu Canca, Gautham Vijay Kumar, Anson Antony
  • Date: 22 Jun 2026
  • Subject: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • DOI: 10.48550/arXiv.2606.23884

论文详情:

  • 作者: Annika Marie Schoene, Cansu Canca, Gautham Vijay Kumar, Anson Antony
  • 日期: 2026 年 6 月 22 日
  • 学科: 计算与语言 (cs.CL);人工智能 (cs.AI)
  • DOI: 10.48550/arXiv.2606.23884