SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing

These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the DINO framework and adapts it to the unique remote sensing data.

[ Paper ], [ GitHub ]

Pretrained models

The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.

arch	patch size	params.	GFLOPs	linear	hugging face	weights	weights-finetune
ViT-S	16	21.59	8.54	72.75	strakajk/satdino-vit_small-16	ckp	ckp
ViT-S	8	21.37	33.56	73.53	strakajk/satdino-vit_small-8	ckp	ckp
ViT-B	16	85.65	33.90	73.52	strakajk/satdino-vit_base-16	ckp	ckp

Create from HF

You can create a model using Hugging Face or from the official GitHub repository.

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("strakajk/satdino-vit_small-16", trust_remote_code=True)
model.eval()

# predict
x = torch.randn(1, 3, 224, 224)
y = model(x)   # out: torch.Size([1, 384])

Results

Dataset	SatDINO₈	SatDINO₁₆	Scale-MAE	SatMAE
EuroSAT	87.72	85.96	85.42	81.43
RESISC45	85.29	82.32	79.96	65.96
UC Merced	94.82	93.21	84.58	78.45
WHU-RS19	98.18	97.82	89.32	86.41
RS-C11	96.91	96.61	93.03	83.96
SIRI-WHU	91.82	87.19	84.84	77.76

Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).

Dataset	Small₁₆	Small₈	Base
EuroSAT	98.69	98.76	98.83
RESISC45	95.68	95.16	96.05
UC Merced	98.33	98.81	98.57
WHU-RS19	98.54	98.06	97.57
RS-C11	98.01	96.81	96.02
SIRI-WHU	98.54	97.08	97.08

SatDINO fine-tuning classification accuracy.

Model	Backbone	Potsdam 224²	Potsdam 512²	Vaihingen 224²	Vaihingen 512²	LoveDA 224²	LoveDA 512²
SatMAE	ViT-Large	67.88	70.39	64,81	69.13	46.28	52.28
Scale-MAE	ViT-Large	69.74	72.21	67.97	71.65	49.37	53.70
SatDINO	ViT-Small₁₆	67.93	71.80	63.38	68.32	44.77	49.65
SatDINO	ViT-Small₈	70.71	71.45	68.69	67.71	47.53	50.20
SatDINO	ViT-Base	67.65	71.63	64.85	69.37	44.25	50.08

Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you find this repository useful, please consider citing it:

@misc{straka2025satdinodeepdiveselfsupervised,
      title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing}, 
      author={Jakub Straka and Ivan Gruber},
      year={2025},
      eprint={2508.21402},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21402}, 
}

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including strakajk/satdino-vit_small-16

SatDINO

Collection

3 items • Updated 14 days ago