NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

NAVI-Orbital：首个用于自主地球观测的零样本视觉语言模型在轨演示

Abstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft.

摘要： 随着地球观测数据的生成速度超过了下行链路带宽和人工处理能力，星上数据采集与地面可操作情报之间的差距日益扩大。本文介绍了 NAVI-Orbital，这是一个部署在近地轨道（LEO）航天器上的软件系统。

On April 16, 2026, NAVI-Orbital achieved what is, to the authors’ knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue.

2026年4月16日，NAVI-Orbital 实现了据作者所知全球首个视觉语言模型在轨自主多模态推理演示。NAVI-Orbital 利用本地视觉语言模型（Gemma 3）对每个捕获的场景进行分类，生成关于场景内容及其特征之间关系的文本描述，并通过自然语言对话响应操作员的后续指令。

The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue.

该系统通过简单的英语提示词而非传统的指令序列进行任务重定向，并由基于图的状态机（LangGraph）进行编排，协调专门的检测和对话智能体。

Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

通过地面基准测试（在 7,960 张精选 AID 基准图像上达到 88.16% 的准确率）、Flatsat 验证以及对新获取的、此前未见的地球图像（包括未经校正的 YAM-9 图像，在星上通过硬件加速 GPU 推理处理且未针对飞行仪器进行微调）的实时在轨捕获，研究结果证明了在卫星级边缘计算机上运行基础模型的可行性。这能够通过在轨对地球观测数据进行语义压缩，从而扭转传统的“先采集后全部下行”的带宽占用模式。