SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing
These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" โ a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the DINO framework and adapts it to the unique remote sensing data.
Pretrained models
The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.
arch | patch size | params. | GFLOPs | linear | hugging face | weights | weights-finetune |
---|---|---|---|---|---|---|---|
ViT-S | 16 | 21.59 | 8.54 | 72.75 | strakajk/satdino-vit_small-16 | ckp | ckp |
ViT-S | 8 | 21.37 | 33.56 | 73.53 | strakajk/satdino-vit_small-8 | ckp | ckp |
ViT-B | 16 | 85.65 | 33.90 | 73.52 | strakajk/satdino-vit_base-16 | ckp | ckp |
Create from HF
You can create a model using Hugging Face or from the official GitHub repository.
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("strakajk/satdino-vit_small-16", trust_remote_code=True)
model.eval()
# predict
x = torch.randn(1, 3, 224, 224)
y = model(x) # out: torch.Size([1, 384])
Results
Dataset | SatDINO8 | SatDINO16 | Scale-MAE | SatMAE |
---|---|---|---|---|
EuroSAT | 87.72 | 85.96 | 85.42 | 81.43 |
RESISC45 | 85.29 | 82.32 | 79.96 | 65.96 |
UC Merced | 94.82 | 93.21 | 84.58 | 78.45 |
WHU-RS19 | 98.18 | 97.82 | 89.32 | 86.41 |
RS-C11 | 96.91 | 96.61 | 93.03 | 83.96 |
SIRI-WHU | 91.82 | 87.19 | 84.84 | 77.76 |
Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).
Dataset | Small16 | Small8 | Base |
---|---|---|---|
EuroSAT | 98.69 | 98.76 | 98.83 |
RESISC45 | 95.68 | 95.16 | 96.05 |
UC Merced | 98.33 | 98.81 | 98.57 |
WHU-RS19 | 98.54 | 98.06 | 97.57 |
RS-C11 | 98.01 | 96.81 | 96.02 |
SIRI-WHU | 98.54 | 97.08 | 97.08 |
SatDINO fine-tuning classification accuracy.
Model | Backbone | Potsdam 2242 | Potsdam 5122 | Vaihingen 2242 | Vaihingen 5122 | LoveDA 2242 | LoveDA 5122 |
---|---|---|---|---|---|---|---|
SatMAE | ViT-Large | 67.88 | 70.39 | 64,81 | 69.13 | 46.28 | 52.28 |
Scale-MAE | ViT-Large | 69.74 | 72.21 | 67.97 | 71.65 | 49.37 | 53.70 |
SatDINO | ViT-Small16 | 67.93 | 71.80 | 63.38 | 68.32 | 44.77 | 49.65 |
SatDINO | ViT-Small8 | 70.71 | 71.45 | 68.69 | 67.71 | 47.53 | 50.20 |
SatDINO | ViT-Base | 67.65 | 71.63 | 64.85 | 69.37 | 44.25 | 50.08 |
Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).
License
This repository is released under the Apache 2.0 license as found in the LICENSE file.
Citation
If you find this repository useful, please consider citing it:
@misc{straka2025satdinodeepdiveselfsupervised,
title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing},
author={Jakub Straka and Ivan Gruber},
year={2025},
eprint={2508.21402},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.21402},
}
- Downloads last month
- 7