In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

检索增强生成的上下文内优化：梯度下降视角

Abstract: In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation.

摘要： 近期研究将上下文学习（In-context learning）与线性自注意力模型中的隐式梯度下降联系起来，表明上下文可以诱导前向传递更新。检索增强生成（RAG）同样依赖于上下文，但检索到的文档通常被视为静态证据，而非用于适应的信号。

We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide.

我们研究了作为上下文内优化过程的 RAG。首先，我们证明了一个线性自注意力层可以在一个统一的线性化 RAG 目标上实现一步梯度下降，该目标涵盖了基于投影和点积的检索接口。这提供了一个检索增强预测与上下文内优化相吻合的精确机制。

We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures.

我们使用这一结果并非将其作为大语言模型（LLM）计算的字面模型，而是作为调整查询与检索证据之间交互的指南。随后，我们测试了这种对应关系的边界：它在受控的线性扩展下保持稳定，但在非线性架构下则变得依赖于特征分布。

Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

最后，我们将这一视角转化为一种针对冻结 RAG LLM 的轻量级方法。该方法保持检索器和主干模型不变，并预测生成器端证据使用接口的上下文条件更新。在七个问答基准、两个检索器和两个冻结 LLM 主干模型的测试中，这种仅前向的更新改进了共享接口基线，能够迁移到未见过的任务，并以极低的单次查询成本接近测试时梯度适应的效果。