GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset Paper • 2507.21033 • Published 10 days ago • 20
ForCenNet: Foreground-Centric Network for Document Image Rectification Paper • 2507.19804 • Published 12 days ago • 11
Region-based Cluster Discrimination for Visual Representation Learning Paper • 2507.20025 • Published 12 days ago • 17
DatologyAI CLIP Models Collection SoTA Image-Text Classification and Retrieval models using only data curation -- for full details please see our blog: https://blog.datologyai.com/ • 2 items • Updated Jun 10 • 5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 68
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 176
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 62
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24 • 114
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24 • 39
Decoupled Global-Local Alignment for Improving Compositional Understanding Paper • 2504.16801 • Published Apr 23 • 15
UniME Collection UniME is a series of multimodal large language models trained for learning universal multimodal embedding. • 4 items • Updated May 16 • 4
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146