OlmoEarth v1.1: A more efficient family of models

OlmoEarth v1.1: A more efficient family of models

OlmoEarth v1.1:更高效的模型系列

We released OlmoEarth (v1) in November 2025. Since then, partners have applied it across a wide range of tasks, from tracking mangrove change to classifying drivers of forest loss to producing country-scale crop-type maps in days, scaling deployments to national, continental, and global areas. Every release moves us closer to our mission: bringing state-of-the-art AI to organizations and communities working to protect people and our planet. 我们于 2025 年 11 月发布了 OlmoEarth (v1)。自那时起,合作伙伴已将其应用于广泛的任务中,从追踪红树林变化、对森林流失的驱动因素进行分类,到在几天内制作国家级的作物类型地图,并将部署规模扩展到国家、大陆乃至全球范围。每一次发布都让我们离使命更近一步:为致力于保护人类和地球的组织与社区提供最先进的 AI 技术。

When OlmoEarth processes satellite imagery to make predictions across tens to hundreds of thousands of square kilometers, efficiency shapes what’s possible. Over the full lifecycle of running OlmoEarth – data export, preprocessing, inference, and post-processing – compute is by far the highest cost. A more efficient model means we can support more partners on the OlmoEarth Platform, and that anyone running OlmoEarth on their own can leverage this technology faster and at lower expense. That’s why we built OlmoEarth v1.1: a new family of models that cuts compute costs by up to 3x while maintaining OlmoEarth v1’s performance on a mix of research benchmarks and tasks we’ve constructed with partners. 当 OlmoEarth 处理卫星图像以对数万至数十万平方公里的区域进行预测时,效率决定了可能性的边界。在运行 OlmoEarth 的整个生命周期中——包括数据导出、预处理、推理和后处理——计算成本是目前最高的开销。更高效的模型意味着我们可以在 OlmoEarth 平台上支持更多的合作伙伴,并且任何自行运行 OlmoEarth 的用户都能以更快的速度和更低的成本利用这项技术。这就是我们构建 OlmoEarth v1.1 的原因:这是一个全新的模型系列,在保持 OlmoEarth v1 在各类研究基准测试及我们与合作伙伴共同构建的任务中表现的同时,将计算成本降低了高达 3 倍。

Increasing efficiency by decreasing sequence lengths

通过缩短序列长度来提高效率

The OlmoEarth models are transformer-based models, one of the dominant architectures in machine learning today. To process remote sensing data, we first convert it into a sequence of tokens the model can ingest. Two important levers control efficiency in transformer-based models: model size (this is why we release a family of models, so users can pick the size that fits their compute budget) and token sequence length. Compute costs scale quadratically with the token sequence length, so even small reductions can meaningfully cut the cost of running the model. OlmoEarth 模型是基于 Transformer 的模型,这是当今机器学习领域的主流架构之一。为了处理遥感数据,我们首先将其转换为模型可以摄取的 Token 序列。在基于 Transformer 的模型中,有两个重要的杠杆控制着效率:模型大小(这就是我们发布模型系列的原因,以便用户可以选择适合其计算预算的大小)和 Token 序列长度。计算成本随 Token 序列长度呈二次方增长,因此即使是微小的缩减也能显著降低运行模型的成本。

Designing the token

Token 的设计

This raises an important question for transformer-based remote sensing models: what should a token represent? Take Sentinel-2 imagery, a common modality we process. A Sentinel-2 input will be some tensor with a height and width (H, W representing the latitudinal and longitudinal pixels), a temporal dimension T, and 12 Sentinel-2 channels ([H, W, T, D=12]). Currently, we split the data into resolution-based patches. Concretely, this means that we will pick some spatial patch size p, and split our overall Sentinel-2 image into patches of size p x p. 这为基于 Transformer 的遥感模型提出了一个重要问题:Token 应该代表什么?以我们处理的常见模态 Sentinel-2 图像为例。Sentinel-2 的输入是一个张量,具有高度和宽度(H 和 W 代表纬度和经度像素)、时间维度 T 以及 12 个 Sentinel-2 通道([H, W, T, D=12])。目前,我们将数据拆分为基于分辨率的 Patch。具体来说,这意味着我们将选择某个空间 Patch 大小 p,并将整个 Sentinel-2 图像拆分为 p x p 大小的 Patch。

For each patch, we create a token per timestep per resolution. So a Sentinel-2 input with 2 timesteps yields 6 tokens per patch (2 timesteps x 3 resolutions, 10m, 20m, and 60m). In total, a [H, W, T, D=12] Sentinel-2 input will yield H/p x W/p x T x 3 tokens. Using a unique token per resolution is a common technique when processing Sentinel-2 data—Galileo and SatMAE both take this approach, and SatMAE shows significantly better results when doing it. However, it is not universal: CROMA is a model that only uses a single token for all bands, regardless of resolution. 对于每个 Patch,我们为每个时间步和每个分辨率创建一个 Token。因此,一个具有 2 个时间步的 Sentinel-2 输入在每个 Patch 上会产生 6 个 Token(2 个时间步 x 3 种分辨率:10m、20m 和 60m)。总计,一个 [H, W, T, D=12] 的 Sentinel-2 输入将产生 H/p x W/p x T x 3 个 Token。在处理 Sentinel-2 数据时,为每个分辨率使用唯一的 Token 是一种常见技术——Galileo 和 SatMAE 都采用了这种方法,且 SatMAE 在这样做时表现出了明显更好的结果。然而,这并非通用做法:CROMA 模型无论分辨率如何,对所有波段仅使用单个 Token。

Because token counts compound multiplicatively, collapsing resolutions into a single token produces three times fewer tokens and material savings across pretraining, fine-tuning, and inference. Naively combining the tokens in this way leads to significant performance drops, including a 10 ppt drop on m-eurosat kNN (a common benchmark task for remote sensing models). We hypothesize that separating Sentinel-2 bands into different tokens makes it easier for OlmoEarth to model important cross-band relationships. Merging tokens without impacting performance required us to modify our pre-training regimen. We describe those changes in detail in our paper. 由于 Token 数量是乘法叠加的,将分辨率合并为单个 Token 可以减少三倍的 Token 数量,从而在预训练、微调和推理过程中节省大量资源。以这种方式简单地合并 Token 会导致性能显著下降,包括在 m-eurosat kNN(一种常见的遥感模型基准任务)上出现 10 个百分点的下降。我们推测,将 Sentinel-2 波段分离为不同的 Token,使 OlmoEarth 更容易建模重要的跨波段关系。要在不影响性能的情况下合并 Token,我们需要修改预训练方案。我们在论文中详细描述了这些变化。

For developers

给开发者

The result is a model family that does more with less. At every size, OlmoEarth v1.1 runs up to three times cheaper than OlmoEarth v1, making frequent, planet-scale map refreshes more affordable for every team running OlmoEarth. If you’re using a model from the original OlmoEarth family, try OlmoEarth v1.1. It provides similar performance to OlmoEarth v1 while requiring one third of the compute, though we have seen some regressions (see our technical report for more details). If it works for your task, you should see a significant speedup during fine-tuning and inference. 其结果是一个以更少资源实现更多功能的模型系列。在所有尺寸下,OlmoEarth v1.1 的运行成本最高可比 OlmoEarth v1 降低三倍,这使得每个运行 OlmoEarth 的团队都能以更低的成本进行频繁的全球规模地图更新。如果您正在使用原始 OlmoEarth 系列的模型,请尝试 OlmoEarth v1.1。它在提供与 OlmoEarth v1 相似性能的同时,仅需三分之一的计算量,尽管我们也观察到了一些性能倒退(详情请参阅我们的技术报告)。如果它适用于您的任务,您应该会看到微调和推理速度的显著提升。

For researchers

给研究人员

Pretrained remote sensing models have many degrees of freedom, which makes them hard to study. When performance shifts, is it the architecture, the dataset, or the pre-training algorithm? We train OlmoEarth v1.1 on the same dataset as OlmoEarth v1, so any differences between the two isolate the effect of methodological changes. We hope this advances understanding of scientific principles when pretraining models for remote sensing. 预训练的遥感模型具有许多自由度,这使得它们难以研究。当性能发生变化时,是因为架构、数据集还是预训练算法?我们在与 OlmoEarth v1 相同的数据集上训练了 OlmoEarth v1.1,因此两者之间的任何差异都能孤立地反映出方法论变化的影响。我们希望这能促进对遥感模型预训练科学原理的理解。

Get started

开始使用

Check out the OlmoEarth v1.1 weights and training code, including the weights for our Base, Tiny, and Nano models. 查看 OlmoEarth v1.1 的权重和训练代码,包括我们 Base、Tiny 和 Nano 模型的权重。