Quantifying Prior Dominance in RAG Systems
Quantifying Prior Dominance in RAG Systems
量化 RAG 系统中的先验主导地位
Abstract: Retrieval-Augmented Generation (RAG) grounds Large Language Models in external knowledge, yet current evaluations rely on discrete heuristics that suffer from ”epistemic blindness” - failing to distinguish genuine contextual information extraction from parametric memory recall.
摘要: 检索增强生成(RAG)技术将大语言模型(LLM)建立在外部知识的基础上,然而目前的评估方法依赖于离散的启发式算法,这些算法存在“认知盲区”,无法区分模型究竟是提取了真实的上下文信息,还是仅仅调用了其参数化记忆。
To address this, we introduce the Normalized Context Utilization (NCU) metric, leveraging continuous token log-probabilities across zero-shot, oracle, and adversarial conditions to strictly quantify contextual information gain.
为了解决这一问题,我们引入了归一化上下文利用率(NCU)指标。该指标利用零样本(zero-shot)、预言机(oracle)和对抗性条件下的连续标记对数概率(token log-probabilities),从而严格量化上下文信息的获取增益。
Evaluating architectures ranging from 1.5B to 72B parameters alongside a proprietary commercial API reveals that for strict factual extraction (without Chain-of-Thought reasoning), traditional scaling laws exhibit extreme diminishing returns: highly efficient Small Language Models (SLMs) match or outperform high-capacity architectures.
通过对参数量从 15 亿到 720 亿不等的架构以及某商业闭源 API 进行评估,研究发现:在进行严格的事实提取(不使用思维链推理)时,传统的缩放定律(scaling laws)表现出极度的边际效应递减——高效的小型语言模型(SLM)在性能上可以媲美甚至超越高容量架构。
Furthermore, we demonstrate that “Prior Dominance” correlates with model scale and proprietary alignments. The evaluated commercial API not only overrode explicit external evidence in nearly half of adversarial conflicts, but also frequently suffered from systemic confidence collapse (Negative Transfer) when its parametric priors were contradicted.
此外,我们证明了“先验主导地位”与模型规模及专有对齐方式相关。被评估的商业 API 不仅在近一半的对抗性冲突中无视了明确的外部证据,而且当其参数化先验与外部证据矛盾时,经常出现系统性的置信度崩溃(负迁移)。
Our findings highlight the structural epistemic advantage and superior contextual adherence of SLMs in strict extraction workflows.
我们的研究结果强调了小型语言模型(SLM)在严格提取工作流中具备的结构性认知优势以及更出色的上下文依从性。