Context Recycling for Long-Horizon LLM Inference

用于长程大模型推理的上下文循环利用技术

Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. 大型语言模型（LLMs）在短上下文推理方面表现出强大的能力，但由于上下文窗口的限制和 Token 使用效率低下，在长程对话场景中性能往往会下降。

We introduce ContextForge, a system for context recycling that maintains task-relevant information across turns by combining structured query generation, external memory retrieval, and controlled synthesis. 我们引入了 ContextForge，这是一个用于上下文循环利用的系统。它通过结合结构化查询生成、外部记忆检索和受控合成，在多轮对话中保持与任务相关的信息。

The system enables efficient reuse of prior computation without relying on full context replay, reducing token overhead while preserving answer quality. 该系统能够在不依赖完整上下文重放的情况下，高效地复用先前的计算结果，从而在保持回答质量的同时降低了 Token 开销。

We evaluate ContextForge using a 15-turn conversational benchmark that tests multi-turn reasoning, back-references, and domain shifts across structured healthcare queries. 我们使用一个包含 15 轮对话的基准测试对 ContextForge 进行了评估，该测试涵盖了结构化医疗查询中的多轮推理、回溯引用以及领域转换等场景。

Compared to a baseline agent using identical underlying models, ContextForge demonstrates improved consistency and reduced token consumption, while maintaining comparable response accuracy. 与使用相同底层模型的基准智能体相比，ContextForge 在保持相当响应准确率的同时，展现出了更好的一致性并降低了 Token 消耗。

These results suggest that context recycling provides a practical approach for extending LLM capabilities in long-horizon tasks without requiring larger context windows or model retraining. 这些结果表明，上下文循环利用为扩展大模型在长程任务中的能力提供了一种实用的方法，且无需更大的上下文窗口或模型重训练。