Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.04512

about 9 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

Augmented Self (speech2speech)

gpt-omni/mini-omni2

Any-to-Any • Updated Oct 24, 2024 • 86 • 276
sesame/csm-1b

Text-to-Speech • Updated Jul 23 • 30.8k • 2.2k
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7 • 36

Video Generation Control-Style Transfer

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Paper • 2412.07721 • Published Dec 10, 2024 • 8
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 9

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 9
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55

AI Tools for Art - April & May '25

Tools & models from the upcoming 4rd issue of AI Tools for Art 🎉 read more: https://open.substack.com/pub/multimodalaiart

FramePack Video Generation

Collection

fast & compact video generation with FramePack - a next-frame prediction neural network structure that generates videos progressively • 9 items • Updated May 26 • 6
LTX Video 0.9.7&0.9.8

Collection

LTX Video 13B and 13B distilled models for video generation by Lightricks • 5 items • Updated 21 days ago
ByteDance/DreamO

Updated Jun 24 • 98
Running on Zero

590

590

DreamO

🐨

A Unified Framework for Image Customization

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published Mar 6 • 3
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

Paper • 2503.15851 • Published Mar 20 • 10
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Paper • 2504.11427 • Published Apr 15 • 19
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7 • 36

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 11
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

about 9 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

AI Tools for Art - April & May '25

Tools & models from the upcoming 4rd issue of AI Tools for Art 🎉 read more: https://open.substack.com/pub/multimodalaiart

FramePack Video Generation

Collection

fast & compact video generation with FramePack - a next-frame prediction neural network structure that generates videos progressively • 9 items • Updated May 26 • 6
LTX Video 0.9.7&0.9.8

Collection

LTX Video 13B and 13B distilled models for video generation by Lightricks • 5 items • Updated 21 days ago
ByteDance/DreamO

Updated Jun 24 • 98
Running on Zero

590

590

DreamO

🐨

A Unified Framework for Image Customization

Augmented Self (speech2speech)

gpt-omni/mini-omni2

Any-to-Any • Updated Oct 24, 2024 • 86 • 276
sesame/csm-1b

Text-to-Speech • Updated Jul 23 • 30.8k • 2.2k
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7 • 36

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published Mar 6 • 3
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

Paper • 2503.15851 • Published Mar 20 • 10
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Paper • 2504.11427 • Published Apr 15 • 19
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7 • 36

Video Generation Control-Style Transfer

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Paper • 2412.07721 • Published Dec 10, 2024 • 8
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 9

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 11
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 9
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs