How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses
How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses
前沿大语言模型如何适应神经多样性背景:一种针对系统提示响应中表面与结构性变化的评估框架
Abstract: We examine if frontier chat-based large language models (LLMs) adjust their outputs based on neurodivergence (ND) context in system prompts and describe the nature of these adjustments. Specifically, we propose NDBench, a 576-output benchmark involving two frontier models, three system prompt types (baseline, ND-profile assertion, and ND-profile assertion with explicit instructions for adjustments), four canonical ND profiles, and 24 prompts across four categories, one of which involves an adversarial masking strategy.
摘要: 我们研究了前沿聊天大语言模型(LLM)是否会根据系统提示中的神经多样性(ND)背景调整其输出,并描述了这些调整的性质。具体而言,我们提出了 NDBench,这是一个包含 576 个输出的基准测试,涉及两个前沿模型、三种系统提示类型(基准、ND 档案声明、以及带有明确调整指令的 ND 档案声明)、四种典型的 ND 档案,以及涵盖四个类别的 24 个提示,其中一个类别涉及对抗性掩蔽策略。
Four trends emerge consistently from our findings. First, LLMs show significant adaptation under ND context, where fully instructed conditions yield lengthier and more structured outputs, characterized by higher token counts, more headings, and more granular steps (p < 10^-8, Holm-corrected). Second, such adaptation is largely structural in nature: although list density does not change much, there is a marked rise in the frequency of headings and per-step detail.
我们的研究结果一致显示出四个趋势。首先,LLM 在 ND 背景下表现出显著的适应性,在完全指令条件下,模型生成的输出更长且结构更清晰,表现为更高的 Token 数量、更多的标题以及更细致的步骤(p < 10^-8,Holm 校正)。其次,这种适应在本质上主要是结构性的:尽管列表密度变化不大,但标题的频率和每一步的细节都有显著增加。
Third, ND persona assertion alone fails to suppress potentially harmful tendencies, as masking-reinforcement decreases only in explicitly instructed cases (36-44% reduction); the reduction rate barely changes in persona assertion conditions. Moreover, reliability analysis of LLM-based harm assessment reveals that only two out of the six dimensions (masking and reinforcement, validation quality) exceed the pre-defined inter-judge agreement criterion (alpha >= 0.67) and thus can be considered primary results. NDBench is made publicly available along with its prompts, outputs, code, and other resources, forming a reproducible framework for auditing future LLMs’ adaptation to ND awareness.
第三,仅进行 ND 人格声明无法抑制潜在的有害倾向,因为掩蔽强化(masking-reinforcement)仅在有明确指令的情况下有所下降(减少了 36-44%);而在单纯的人格声明条件下,减少率几乎没有变化。此外,基于 LLM 的危害评估的可靠性分析显示,六个维度中只有两个(掩蔽与强化、验证质量)超过了预定义的评判者间一致性标准(alpha >= 0.67),因此可以被视为主要结果。NDBench 及其提示词、输出、代码和其他资源已公开,形成了一个可复现的框架,用于审计未来 LLM 对 ND 意识的适应性。