Collections
Discover the best community collections!
Collections including paper arxiv:2507.03745
-
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Paper • 2507.07202 • Published • 22 -
StreamDiT: Real-Time Streaming Text-to-Video Generation
Paper • 2507.03745 • Published • 29 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Paper • 2507.15728 • Published • 7
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
From Slow Bidirectional to Fast Causal Video Generators
Paper • 2412.07772 • Published • 1 -
Navigation World Models
Paper • 2412.03572 • Published • 2 -
MAGI-1: Autoregressive Video Generation at Scale
Paper • 2505.13211 • Published • 4 -
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Paper • 2507.08801 • Published • 30
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 102 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 28 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Paper • 2306.10012 • Published • 36 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 46 -
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 40 -
haoningwu/StoryGen
Updated • 4
-
From Slow Bidirectional to Fast Causal Video Generators
Paper • 2412.07772 • Published • 1 -
Navigation World Models
Paper • 2412.03572 • Published • 2 -
MAGI-1: Autoregressive Video Generation at Scale
Paper • 2505.13211 • Published • 4 -
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
Paper • 2507.08801 • Published • 30
-
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Paper • 2507.07202 • Published • 22 -
StreamDiT: Real-Time Streaming Text-to-Video Generation
Paper • 2507.03745 • Published • 29 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Paper • 2507.15728 • Published • 7
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 102 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 28 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Paper • 2306.10012 • Published • 36 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 46 -
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 40 -
haoningwu/StoryGen
Updated • 4