Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models
Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models
话语角色标签:作为语言模型上下文利用的呈现时变量
Abstract: Context-augmented language model systems often wrap supplied content with labels such as Reference:, Evidence:, Instruction:, Note:, or Example:, but the effect of these labels on reader-model behavior remains underexplored. 摘要: 上下文增强型语言模型系统通常会使用诸如“参考(Reference)”、“证据(Evidence)”、“指令(Instruction)”、“注释(Note)”或“示例(Example)”等标签来包装所提供的上下文内容,但这些标签对阅读模型行为的影响尚缺乏深入研究。
We introduce a paired fixed-content probe over 500 MMLU-Pro items: each item receives the same misleading answer-bearing assertion under different discourse-role labels, and adoption is measured by whether the model outputs the injected wrong option. 我们针对 500 个 MMLU-Pro 条目引入了一项配对的固定内容探测实验:每个条目在不同的“话语角色标签”下接收相同的、带有误导性答案的断言,并通过模型是否输出注入的错误选项来衡量其采纳程度。
Across GPT-5.5, DeepSeek V4 Pro, Llama-3-8B-Instruct, and Qwen2.5-7B-Instruct, Misleading Adoption Rate shifts by 56-84 percentage points. Binding or source-like labels such as Instruction: and Reference: produce high adoption, whereas Example: consistently suppresses it. 在 GPT-5.5、DeepSeek V4 Pro、Llama-3-8B-Instruct 和 Qwen2.5-7B-Instruct 等模型中,误导性信息的采纳率波动幅度高达 56-84 个百分点。诸如“指令(Instruction)”和“参考(Reference)”这类具有约束性或来源导向的标签会导致较高的采纳率,而“示例(Example)”标签则始终表现出抑制作用。
Paired tests, bootstrap intervals, final-instruction ablations, and Qwen final-step log-probability probes support a label-conditioned candidate preference. Boundary probes show where the effect weakens or persists: arithmetic tasks reduce adoption, passage-shaped external context preserves smaller label gaps, short-answer evaluation rules out option-letter copying, and nested-label conflicts suggest that illustrative framing can delimit adoption scope. 配对测试、自助法区间估计、最终指令消融实验以及 Qwen 最终步骤的对数概率探测,均支持“标签条件下的候选偏好”这一结论。边界探测显示了该效应在何处减弱或持续:算术任务会降低采纳率,段落形式的外部上下文会缩小标签间的差距,简答评估排除了选项字母复制的可能性,而嵌套标签冲突则表明,说明性的框架可以界定采纳的范围。
A 200-case single-author manual audit confirms that the short-answer contrasts are stable under conservative adjudication. The resulting claim is bounded but practical: context-utilization and reader-side RAG benchmarks should report and control wrapper labels, because presentation choices can change measured reliance on supplied context. 一项包含 200 个案例的单作者人工审计证实,在保守判定下,简答题的对比结果是稳定的。由此得出的结论虽然有限但具有实际意义:上下文利用和阅读端 RAG(检索增强生成)基准测试应报告并控制包装标签,因为呈现方式的选择会改变模型对所提供上下文的测量依赖度。