arxiv:2508.21402

SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing

Published on Aug 29

Authors:

Abstract

SatDINO, a contrastive self-supervised model tailored for satellite imagery, outperforms MAE-based methods and achieves competitive results across multiple benchmarks with novel enhancements like GSD encoding and adaptive view sampling.

AI-generated summary

Self-supervised learning has emerged as a powerful tool for remote sensing, where large amounts of unlabeled data are available. In this work, we investigate the use of DINO, a contrastive self-supervised method, for pretraining on remote sensing imagery. We introduce SatDINO, a model tailored for representation learning in satellite imagery. Through extensive experiments on multiple datasets in multiple testing setups, we demonstrate that SatDINO outperforms other state-of-the-art methods based on much more common masked autoencoders (MAE) and achieves competitive results in multiple benchmarks. We also provide a rigorous ablation study evaluating SatDINO's individual components. Finally, we propose a few novel enhancements, such as a new way to incorporate ground sample distance (GSD) encoding and adaptive view sampling. These enhancements can be used independently on our SatDINO model. Our code and trained models are available at: https://github.com/strakaj/SatDINO.