SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

SOLAR：用于终身学习与持续适应的自优化开放式自主智能体

Abstract: Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. 摘要： 尽管大型语言模型（LLMs）取得了显著成功，但在动态的现实世界环境中部署时仍面临瓶颈，主要挑战在于概念漂移以及基于梯度的适应过程所带来的高昂成本。

Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic forgetting or requiring extensive manual data curation. 传统的微调（FT）方法在适应非平稳数据流时，往往难以避免灾难性遗忘，或者需要进行大量的人工数据整理。

To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. 为了在流式学习和持续学习范式中解决这些局限性，我们提出了“自优化终身自主推理器”（SOLAR）。这是一个开放式的自主智能体，它利用参数级元学习来实现自我提升，并将模型权重视为探索的环境。

It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. 该智能体通过整合常识知识的强先验来启动学习过程，从而使其在迁移学习中表现出色。

By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. 通过利用多级强化学习方法，SOLAR 能够自主发现适应策略，从而实现对未见领域的高效测试时适应（test-time adaptation）。

Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). 至关重要的是，SOLAR 维护着一个包含有效修改策略的演进知识库，隐式地充当情景记忆缓冲区，以平衡可塑性（对新任务的适应）和稳定性（元知识的保留）。

Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments. 实验表明，SOLAR 在常识、数学、医学、编程、社交和逻辑推理任务中均优于强基准模型，这标志着在迈向能够在演进环境中进行终身适应的自主智能体方面迈出了重要一步。