One Year Later...The Harms Persist, But So Do We!

One Year Later…The Harms Persist, But So Do We!

一年过去了……危害依然存在，但我们依然坚持！

Abstract: General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate and inconsistent across clinical conditions. This study evaluates six proprietary LLMs across 16 DSM-5 conditions using four adversarial attack variants, introducing an eight-dimension harm taxonomy and a multi-dimensional evaluation framework.

摘要： 通用大语言模型（LLMs）正越来越多地被用于心理健康相关的对话，然而其安全防护措施在不同临床病症中仍然显得不足且不一致。本研究通过四种对抗性攻击变体，评估了六种专有大语言模型在 16 种 DSM-5（精神疾病诊断与统计手册第 5 版）病症下的表现，并引入了一个八维度的危害分类法及多维评估框架。

Results show that safeguards hold reliably only for suicide and self-harm, while conditions such as eating disorders, substance use disorder, and major depressive disorder exhibit failure rates of up to 100%. We argue that ethical design and deployment of these LLMs demand clearly defined harm categories across clinical conditions and implementation of safeguards accordingly.

研究结果显示，仅在自杀和自残相关问题上，安全防护措施表现可靠；而在进食障碍、物质使用障碍和重度抑郁症等病症中，防护失败率高达 100%。我们认为，这些大语言模型的伦理设计与部署，要求针对不同临床病症明确界定危害类别，并据此实施相应的安全防护措施。

Until such safeguards are in place, these models pose significant risks to vulnerable populations, making their growing integration into educational settings a particularly concerning.

在这些防护措施到位之前，这些模型对弱势群体构成了重大风险，这使得它们在教育环境中的日益普及变得尤为令人担忧。

Paper Details:

Authors: Annika Marie Schoene, Cansu Canca, Gautham Vijay Kumar, Anson Antony
Date: 22 Jun 2026
Subject: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
DOI: 10.48550/arXiv.2606.23884

论文详情：

作者： Annika Marie Schoene, Cansu Canca, Gautham Vijay Kumar, Anson Antony
日期： 2026 年 6 月 22 日
学科： 计算与语言 (cs.CL)；人工智能 (cs.AI)
DOI： 10.48550/arXiv.2606.23884