--- license: creativeml-openrail-m datasets: - DarthReca/crisislandmark language: - en library_name: transformers tags: - remote-sensing - text-to-image-retrieval - multimodal - geospatial - SAR - multispectral - crisis-management - earth-observation - contrastive-learning base_model: - sentence-transformers/all-MiniLM-L6-v2 --- # CLOSP-VL CLOSP (Contrastive Language Optical SAR Pretraining) is a multimodal architecture designed for text-to-image retrieval. It creates a unified embedding space for text, Sentinel-2 (MSI), and Sentinel-1 (SAR) data. The CLOSP-VL variant uses a ViT-large vision backbone. ## Model Details The model uses three separate encoders: one for text, one for Sentinel-1 (SAR) data, and one for Sentinel-2 (MSI) data. During training, it uses a contrastive objective to align the textual embeddings with the corresponding visual embeddings (either SAR or MSI). - **Developed by:** Daniele Rege Cambrin - **Model type:** CLOSP - **Language(s) (NLP):** english - **License:** OpenRAIL - **Finetuned from model:** [More Information Needed] - **Repository:** [GitHub](https://github.com/DarthReca/closp) - **Paper:** [ArXiv](https://arxiv.org/abs/2507.10403) ## How to Get Started with the Model Use the code below to get started with the model. ```python model = AutoModel.from_pretrained("DarthReca/CLOSP-VL", trust_remote_code=True) ``` ## Citation ```bibtex @misc{cambrin2025texttoremotesensingimageretrievalrgbsources, title={Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources}, author={Daniele Rege Cambrin and Lorenzo Vaiani and Giuseppe Gallipoli and Luca Cagliero and Paolo Garza}, year={2025}, eprint={2507.10403}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.10403}, } ``` ## Licensing The data in this dataset is a compilation of multiple sources, each with its own license. For detailed information on the licensing of each component, please see the [**NOTICE.md**](NOTICE.md) file.