NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

NightFeats @ MMU-RAGent NeurIPS 2025：面向文本到文本赛道的上下文优化多智能体 RAG 系统

Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. 摘要： 我们介绍了 NightFeats，这是一个结构化的多智能体检索增强生成（RAG）系统。该系统提交至 NeurIPS 2025 的 MMU-RAGent 竞赛，并荣获文本到文本赛道的“最佳动态评估奖”（Best Dynamic Evaluation）。

Rather than targeting benchmark maximization, this work proposes a principled pipeline that decomposes knowledge synthesis into three coordinated phases: retrieval, curation, and composition, each governed by explicit intermediate representations and handoff contracts. 本研究并未单纯追求基准测试分数的最大化，而是提出了一种原则性的流水线，将知识合成过程分解为三个协同阶段：检索、整理与组合。每个阶段均由明确的中间表示和交接契约（handoff contracts）进行管理。

Inspired by Agentic Context Engineering (ACE), the system introduces temporal-semantic reranking, bounded contradiction reconciliation, and citation-preserving composition as core architectural primitives. 受智能体上下文工程（ACE）的启发，该系统引入了时序语义重排序、有界矛盾调和以及保留引用的组合方式，作为其核心架构原语。

Competition results show that NightFeats surpasses proprietary baselines including Claude-SonnetV2 and Nova-Pro on LLM-as-a-Judge and Human Likert evaluations, confirming that architectural transparency and verifiable evidence grounding are better aligned with human preferences than systems optimizing narrowly for automatic similarity metrics. 竞赛结果表明，在“大模型作为裁判”（LLM-as-a-Judge）和人类李克特量表（Likert）评估中，NightFeats 的表现超越了包括 Claude-SonnetV2 和 Nova-Pro 在内的专有基准模型。这证实了相比于仅针对自动相似度指标进行优化的系统，架构透明度和可验证的证据溯源更符合人类的偏好。