Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.11340

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 9
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 28
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 36
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31

Multimodal Model

A curated list for Multimodal Model Generation papers.

OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 75
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published Feb 8 • 22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 52

Papers - Image - CoT

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 129

Omni-Generation

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published Nov 26, 2024 • 10
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Paper • 2412.01064 • Published Dec 2, 2024 • 46
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 13

Diffusion-Papers

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

Prompt Expansion

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published Sep 12, 2024 • 23
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 44
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published Oct 9, 2024 • 19
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116

Running on Zero

246

246

FLUX.1 Dev Inpainting Model Beta GPU

🏆

Repair and enhance images using prompts
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 222

Interesting papers

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 9
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 28
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 36
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

Multimodal Model

A curated list for Multimodal Model Generation papers.

OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 75
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published Feb 8 • 22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 52

Prompt Expansion

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published Sep 12, 2024 • 23
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 44
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published Oct 9, 2024 • 19
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116

Papers - Image - CoT

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 129

Running on Zero

246

246

FLUX.1 Dev Inpainting Model Beta GPU

🏆

Repair and enhance images using prompts
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 222

Omni-Generation

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published Nov 26, 2024 • 10
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Paper • 2412.01064 • Published Dec 2, 2024 • 46
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 13

Interesting papers

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75

Diffusion-Papers

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs