Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16064

about 24 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18 • 4
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 159
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 115

Paper Trial - 2025

2025 Weekly Paper Reading Challenge: This is a journey of reading and sharing insightful research papers throughout 2025.

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Paper • 2412.06781 • Published Dec 9, 2024 • 24
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Do computer vision foundation models learn the low-level characteristics of the human visual system?

Paper • 2502.20256 • Published Feb 27
High-Resolution Building and Road Detection from Sentinel-2

Paper • 2310.11622 • Published Oct 17, 2023

about 10 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14

generative-model-training

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 60
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 47
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 36
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23

about 24 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

about 10 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18 • 4
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 159
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 115

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14

Paper Trial - 2025

2025 Weekly Paper Reading Challenge: This is a journey of reading and sharing insightful research papers throughout 2025.

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Paper • 2412.06781 • Published Dec 9, 2024 • 24
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Do computer vision foundation models learn the low-level characteristics of the human visual system?

Paper • 2502.20256 • Published Feb 27
High-Resolution Building and Road Detection from Sentinel-2

Paper • 2310.11622 • Published Oct 17, 2023

generative-model-training

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 60
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 47
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 36
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs