Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16915

about 4 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 14
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 154

OPen data for VLM

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121
DreamO: A Unified Framework for Image Customization

Paper • 2504.16915 • Published Apr 23 • 24
An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 63

about 11 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 76

AI Tools for Art - April & May '25

Tools & models from the upcoming 4rd issue of AI Tools for Art 🎉 read more: https://open.substack.com/pub/multimodalaiart

FramePack Video Generation

Collection

fast & compact video generation with FramePack - a next-frame prediction neural network structure that generates videos progressively • 9 items • Updated May 26 • 6
LTX Video 0.9.7&0.9.8

Collection

LTX Video 13B and 13B distilled models for video generation by Lightricks • 5 items • Updated Aug 13
ByteDance/DreamO

Updated Jun 24 • 99
Running on Zero

592

592

DreamO

🐨

A Unified Framework for Image Customization

Diffusion Model Control

Control Methods for Diffusion and Score Models

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12

alvdansen/flux-koda

Text-to-Image • Updated Aug 16, 2024 • 2.35k • • 244
black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Jun 27 • 1.67M • • 11.4k
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 213k • 3.5k
black-forest-labs/FLUX.1-Fill-dev

Updated Jun 27 • 305k • 920

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 12
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

about 4 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

AI Tools for Art - April & May '25

Tools & models from the upcoming 4rd issue of AI Tools for Art 🎉 read more: https://open.substack.com/pub/multimodalaiart

FramePack Video Generation

Collection

fast & compact video generation with FramePack - a next-frame prediction neural network structure that generates videos progressively • 9 items • Updated May 26 • 6
LTX Video 0.9.7&0.9.8

Collection

LTX Video 13B and 13B distilled models for video generation by Lightricks • 5 items • Updated Aug 13
ByteDance/DreamO

Updated Jun 24 • 99
Running on Zero

592

592

DreamO

🐨

A Unified Framework for Image Customization

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 14
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 154

Diffusion Model Control

Control Methods for Diffusion and Score Models

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12

OPen data for VLM

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121
DreamO: A Unified Framework for Image Customization

Paper • 2504.16915 • Published Apr 23 • 24
An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 63

alvdansen/flux-koda

Text-to-Image • Updated Aug 16, 2024 • 2.35k • • 244
black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Jun 27 • 1.67M • • 11.4k
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 213k • 3.5k
black-forest-labs/FLUX.1-Fill-dev

Updated Jun 27 • 305k • 920

about 11 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 76

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 12
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs