CLOSP-VL

CLOSP (Contrastive Language Optical SAR Pretraining) is a multimodal architecture designed for text-to-image retrieval. It creates a unified embedding space for text, Sentinel-2 (MSI), and Sentinel-1 (SAR) data. The CLOSP-VL variant uses a ViT-large vision backbone.

Model Details

The model uses three separate encoders: one for text, one for Sentinel-1 (SAR) data, and one for Sentinel-2 (MSI) data. During training, it uses a contrastive objective to align the textual embeddings with the corresponding visual embeddings (either SAR or MSI).

  • Developed by: Daniele Rege Cambrin
  • Model type: CLOSP
  • Language(s) (NLP): english
  • License: OpenRAIL
  • Finetuned from model: [More Information Needed]
  • Repository: GitHub
  • Paper: ArXiv

How to Get Started with the Model

Use the code below to get started with the model.

model = AutoModel.from_pretrained("DarthReca/CLOSP-VL", trust_remote_code=True)  

Citation

@misc{cambrin2025texttoremotesensingimageretrievalrgbsources,
      title={Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources}, 
      author={Daniele Rege Cambrin and Lorenzo Vaiani and Giuseppe Gallipoli and Luca Cagliero and Paolo Garza},
      year={2025},
      eprint={2507.10403},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.10403}, 
}

Licensing

The data in this dataset is a compilation of multiple sources, each with its own license. For detailed information on the licensing of each component, please see the NOTICE.md file.

Downloads last month
6
Safetensors
Model size
632M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DarthReca/CLOSP-VL

Finetuned
(480)
this model

Dataset used to train DarthReca/CLOSP-VL

Collection including DarthReca/CLOSP-VL