NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

NightFeats @ MMU-RAGent NeurIPS 2025:面向文本到文本赛道的上下文优化多智能体 RAG 系统

Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. 摘要: 我们介绍了 NightFeats,这是一个结构化的多智能体检索增强生成(RAG)系统。该系统提交至 NeurIPS 2025 的 MMU-RAGent 竞赛,并荣获文本到文本赛道的“最佳动态评估奖”(Best Dynamic Evaluation)。

Rather than targeting benchmark maximization, this work proposes a principled pipeline that decomposes knowledge synthesis into three coordinated phases: retrieval, curation, and composition, each governed by explicit intermediate representations and handoff contracts. 本研究并未单纯追求基准测试分数的最大化,而是提出了一种原则性的流水线,将知识合成过程分解为三个协同阶段:检索、整理与组合。每个阶段均由明确的中间表示和交接契约(handoff contracts)进行管理。

Inspired by Agentic Context Engineering (ACE), the system introduces temporal-semantic reranking, bounded contradiction reconciliation, and citation-preserving composition as core architectural primitives. 受智能体上下文工程(ACE)的启发,该系统引入了时序语义重排序、有界矛盾调和以及保留引用的组合方式,作为其核心架构原语。

Competition results show that NightFeats surpasses proprietary baselines including Claude-SonnetV2 and Nova-Pro on LLM-as-a-Judge and Human Likert evaluations, confirming that architectural transparency and verifiable evidence grounding are better aligned with human preferences than systems optimizing narrowly for automatic similarity metrics. 竞赛结果表明,在“大模型作为裁判”(LLM-as-a-Judge)和人类李克特量表(Likert)评估中,NightFeats 的表现超越了包括 Claude-SonnetV2 和 Nova-Pro 在内的专有基准模型。这证实了相比于仅针对自动相似度指标进行优化的系统,架构透明度和可验证的证据溯源更符合人类的偏好。