SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing

These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" โ€” a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the DINO framework and adapts it to the unique remote sensing data.

[ Paper ], [ GitHub ]

Pretrained models

The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.

arch patch size params. GFLOPs linear hugging face weights weights-finetune
ViT-S 16 21.59 8.54 72.75 strakajk/satdino-vit_small-16 ckp ckp
ViT-S 8 21.37 33.56 73.53 strakajk/satdino-vit_small-8 ckp ckp
ViT-B 16 85.65 33.90 73.52 strakajk/satdino-vit_base-16 ckp ckp

Create from HF

You can create a model using Hugging Face or from the official GitHub repository.

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("strakajk/satdino-vit_small-16", trust_remote_code=True)
model.eval()

# predict
x = torch.randn(1, 3, 224, 224)
y = model(x)   # out: torch.Size([1, 384])

Results

Dataset SatDINO8 SatDINO16 Scale-MAE SatMAE
EuroSAT 87.72 85.96 85.42 81.43
RESISC45 85.29 82.32 79.96 65.96
UC Merced 94.82 93.21 84.58 78.45
WHU-RS19 98.18 97.82 89.32 86.41
RS-C11 96.91 96.61 93.03 83.96
SIRI-WHU 91.82 87.19 84.84 77.76

Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).


Dataset Small16 Small8 Base
EuroSAT 98.69 98.76 98.83
RESISC45 95.68 95.16 96.05
UC Merced 98.33 98.81 98.57
WHU-RS19 98.54 98.06 97.57
RS-C11 98.01 96.81 96.02
SIRI-WHU 98.54 97.08 97.08

SatDINO fine-tuning classification accuracy.


Model Backbone Potsdam 2242 Potsdam 5122 Vaihingen 2242 Vaihingen 5122 LoveDA 2242 LoveDA 5122
SatMAE ViT-Large 67.88 70.39 64,81 69.13 46.28 52.28
Scale-MAE ViT-Large 69.74 72.21 67.97 71.65 49.37 53.70
SatDINO ViT-Small16 67.93 71.80 63.38 68.32 44.77 49.65
SatDINO ViT-Small8 70.71 71.45 68.69 67.71 47.53 50.20
SatDINO ViT-Base 67.65 71.63 64.85 69.37 44.25 50.08

Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you find this repository useful, please consider citing it:

@misc{straka2025satdinodeepdiveselfsupervised,
      title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing}, 
      author={Jakub Straka and Ivan Gruber},
      year={2025},
      eprint={2508.21402},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21402}, 
}
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including strakajk/satdino-vit_small-16