Lanorman
Lavico
AI & ML interests
None yet
Organizations
None yet
Autoregressive
Benchmark
camera_object_control_gen
Scene_Gen
Diffusion
4D_Diffusion
layout
-
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Paper • 2411.10913 • Published • 4 -
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
Pathways on the Image Manifold: Image Editing via Video Generation
Paper • 2411.16819 • Published • 37
Diffusion_GS
Virtual-Try On
-
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper • 2411.00225 • Published • 11 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 9 -
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on
Paper • 2411.10499 • Published • 13 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 29
Motion_Gen
Dynamic_GS
-
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Paper • 2410.17249 • Published • 42 -
VeGaS: Video Gaussian Splatting
Paper • 2411.11024 • Published • 7 -
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Paper • 2501.03714 • Published • 9
Video-Gen Training-Free
Relighting
-
Subsurface Scattering for 3D Gaussian Splatting
Paper • 2408.12282 • Published • 7 -
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering
Paper • 2409.07441 • Published • 11 -
GS^3: Efficient Relighting with Triple Gaussian Splatting
Paper • 2410.11419 • Published • 12 -
SpotLight: Shadow-Guided Object Relighting via Diffusion
Paper • 2411.18665 • Published • 3
NeRF
-
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Paper • 2406.06133 • Published • 12 -
Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images
Paper • 2406.13393 • Published • 5 -
BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes
Paper • 2407.15848 • Published • 17 -
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Paper • 2404.01300 • Published • 4
3D-Edit
-
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
Paper • 2407.11394 • Published • 12 -
Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control
Paper • 2410.06985 • Published • 5 -
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion
Paper • 2410.08168 • Published • 9 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 24
Text-to-3D
-
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
Paper • 2406.16273 • Published • 43 -
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images
Paper • 2407.06191 • Published • 14 -
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
Paper • 2407.13976 • Published • 5 -
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Paper • 2411.17945 • Published • 27
Image-Gen Personalization
-
pOps: Photo-Inspired Diffusion Operators
Paper • 2406.01300 • Published • 18 -
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper • 2406.06911 • Published • 12 -
Interpreting the Weight Space of Customized Diffusion Models
Paper • 2406.09413 • Published • 20 -
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Paper • 2406.09162 • Published • 14
Image-Gen Lightweight
-
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
Paper • 2405.14477 • Published • 20 -
Phased Consistency Model
Paper • 2405.18407 • Published • 48 -
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Paper • 2411.05007 • Published • 22 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 27
Image-Gen Edit
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 33 -
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Paper • 2406.10601 • Published • 70 -
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Paper • 2407.05282 • Published • 15 -
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Paper • 2407.16982 • Published • 42
Image-Gen Text
-
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Paper • 2406.08392 • Published • 21 -
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Paper • 2406.10208 • Published • 22 -
AMO Sampler: Enhancing Text Rendering with Overshooting
Paper • 2411.19415 • Published • 5
Image-Gen StyleInject
-
Magic Insert: Style-Aware Drag-and-Drop
Paper • 2407.02489 • Published • 22 -
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper • 2408.05492 • Published • 7 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper • 2408.16766 • Published • 18 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 39
Video-Gen Edit
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Paper • 2405.16537 • Published • 17 -
ReVideo: Remake a Video with Motion and Content Control
Paper • 2405.13865 • Published • 25 -
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Paper • 2406.16863 • Published • 11 -
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper • 2409.13591 • Published • 17
Video-Gen Customization
-
Still-Moving: Customized Video Generation without Customized Video Data
Paper • 2407.08674 • Published • 13 -
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Paper • 2408.13239 • Published • 12 -
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Paper • 2409.16160 • Published • 33 -
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Paper • 2409.17280 • Published • 11
Video-Gen Benchmark
Video-Gen LLM-based
Video-Gen-GS
-
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Paper • 2407.11398 • Published • 10 -
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper • 2407.12781 • Published • 13 -
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
Paper • 2409.13648 • Published • 12 -
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper • 2409.20563 • Published • 9
Video-Gen Dataset
Image-Captioning
SAM_based
-
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22 -
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Paper • 2408.16768 • Published • 28 -
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Paper • 2410.16268 • Published • 69
RL
-
Lessons from Learning to Spin "Pens"
Paper • 2407.18902 • Published • 21 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents
Paper • 2410.07484 • Published • 51 -
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Paper • 2410.03187 • Published • 7
Dataset
-
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads
Paper • 2407.18245 • Published • 11 -
AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark
Paper • 2409.15041 • Published • 14 -
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Paper • 2410.09732 • Published • 55 -
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Paper • 2410.13754 • Published • 75
Video-Gen Long
-
Training-free Long Video Generation with Chain of Diffusion Model Experts
Paper • 2408.13423 • Published • 23 -
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Paper • 2410.02757 • Published • 36 -
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
Paper • 2411.13807 • Published • 11 -
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Paper • 2411.18671 • Published • 20
Omni-Generation
-
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Video-Guided Foley Sound Generation with Multimodal Controls
Paper • 2411.17698 • Published • 10 -
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Paper • 2412.01064 • Published • 46 -
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper • 2412.01169 • Published • 13
GAN
VLM
-
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 46 -
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Paper • 2501.04003 • Published • 27 -
VideoRAG: Retrieval-Augmented Generation over Video Corpus
Paper • 2501.05874 • Published • 73
3D_GEN_lightweight
Light Memory Diffusion
Pose_Estimation
Animating
-
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Paper • 2411.18197 • Published • 14 -
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Paper • 2412.00174 • Published • 23 -
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper • 2412.01106 • Published • 23 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 8
Accelerator
Medical
FLUX_related
-
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25 -
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
Paper • 2411.06558 • Published • 36 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 10 -
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering
Paper • 2501.05131 • Published • 37
Image_Restoration
Dynamic_Gen
Video-Gen Pose
Vision Task
-
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Paper • 2406.09406 • Published • 15 -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9 -
What If We Recaption Billions of Web Images with LLaMA-3?
Paper • 2406.08478 • Published • 41
3D
-
GECO: Generative Image-to-3D within a SECOnd
Paper • 2405.20327 • Published • 11 -
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Paper • 2406.03184 • Published • 22 -
NPGA: Neural Parametric Gaussian Avatars
Paper • 2405.19331 • Published • 10 -
Unified Text-to-Image Generation and Retrieval
Paper • 2406.05814 • Published • 16
GS
-
GFlow: Recovering 4D World from Monocular Video
Paper • 2405.18426 • Published • 17 -
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Paper • 2405.18424 • Published • 9 -
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting
Paper • 2405.15125 • Published • 8 -
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
Paper • 2405.19957 • Published • 10
3D_Diffusion
-
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Paper • 2408.10195 • Published • 13 -
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
Paper • 2408.08000 • Published • 9 -
DC3DO: Diffusion Classifier for 3D Objects
Paper • 2408.06693 • Published • 11 -
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Paper • 2408.14211 • Published • 11
Image-Gen Theoretical
-
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 30 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 59 -
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Paper • 2406.07546 • Published • 9
Image-Gen Accelerator(Distill)
-
MLCM: Multistep Consistency Distillation of Latent Diffusion Model
Paper • 2406.05768 • Published • 13 -
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
Paper • 2406.14539 • Published • 27 -
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration
Paper • 2410.01723 • Published • 5 -
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Paper • 2412.02030 • Published • 19
Image-Gen Autoregressive
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 16 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 26 -
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Paper • 2410.08159 • Published • 26
Image-Gen
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
Image-Gen Story
-
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper • 2407.08683 • Published • 25 -
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
Paper • 2410.06244 • Published • 19 -
Unbounded: A Generative Infinite Game of Character Life Simulation
Paper • 2410.18975 • Published • 37 -
Generative AI for Cel-Animation: A Survey
Paper • 2501.06250 • Published • 13
Video-Gen
-
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Paper • 2405.20222 • Published • 11 -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
Paper • 2406.00908 • Published • 12 -
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Paper • 2406.02509 • Published • 10 -
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper • 2406.02230 • Published • 18
Video_Gen lightweight
Video_Gen Translation
Video-Gen Trajectory
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper • 2407.21705 • Published • 27 -
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Paper • 2408.11475 • Published • 18 -
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Paper • 2408.13413 • Published • 14 -
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper • 2409.18964 • Published • 26
Video-3D
-
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper • 2407.12781 • Published • 13 -
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Novel View Extrapolation with Video Diffusion Priors
Paper • 2411.14208 • Published • 10 -
World-consistent Video Diffusion with Explicit 3D Modeling
Paper • 2412.01821 • Published • 4
Video-Gen Diffusion_4D(DiT etc)
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Paper • 2405.17405 • Published • 16 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12 -
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Paper • 2405.20674 • Published • 15 -
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Paper • 2406.07472 • Published • 13
Video-Audio
Segmentation
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116 -
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Paper • 2408.07416 • Published • 7 -
SMITE: Segment Me In TimE
Paper • 2410.18538 • Published • 16 -
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Paper • 2410.23287 • Published • 19
depthmap
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
Paper • 2406.12849 • Published • 50 -
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
Paper • 2407.17952 • Published • 32 -
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Paper • 2409.18124 • Published • 33
Robot-related
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper • 2406.02523 • Published • 12 -
UniT: Unified Tactile Representation for Robot Learning
Paper • 2408.06481 • Published • 10 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 3 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5
Evaluation
Captioning
-
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Paper • 2409.18125 • Published • 34 -
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Paper • 2411.15411 • Published • 8
Detection
LLM
GAN
Autoregressive
VLM
-
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 46 -
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Paper • 2501.04003 • Published • 27 -
VideoRAG: Retrieval-Augmented Generation over Video Corpus
Paper • 2501.05874 • Published • 73
Benchmark
3D_GEN_lightweight
camera_object_control_gen
Light Memory Diffusion
Scene_Gen
Pose_Estimation
Diffusion
Animating
-
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Paper • 2411.18197 • Published • 14 -
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Paper • 2412.00174 • Published • 23 -
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper • 2412.01106 • Published • 23 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 8
4D_Diffusion
Accelerator
layout
-
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Paper • 2411.10913 • Published • 4 -
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 87 -
Pathways on the Image Manifold: Image Editing via Video Generation
Paper • 2411.16819 • Published • 37
Medical
Diffusion_GS
FLUX_related
-
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25 -
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
Paper • 2411.06558 • Published • 36 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 10 -
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering
Paper • 2501.05131 • Published • 37
Virtual-Try On
-
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper • 2411.00225 • Published • 11 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 9 -
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on
Paper • 2411.10499 • Published • 13 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 29
Image_Restoration
Motion_Gen
Dynamic_Gen
Dynamic_GS
-
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Paper • 2410.17249 • Published • 42 -
VeGaS: Video Gaussian Splatting
Paper • 2411.11024 • Published • 7 -
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Paper • 2501.03714 • Published • 9
Video-Gen Pose
Video-Gen Training-Free
Vision Task
-
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Paper • 2406.09406 • Published • 15 -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9 -
What If We Recaption Billions of Web Images with LLaMA-3?
Paper • 2406.08478 • Published • 41
Relighting
-
Subsurface Scattering for 3D Gaussian Splatting
Paper • 2408.12282 • Published • 7 -
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering
Paper • 2409.07441 • Published • 11 -
GS^3: Efficient Relighting with Triple Gaussian Splatting
Paper • 2410.11419 • Published • 12 -
SpotLight: Shadow-Guided Object Relighting via Diffusion
Paper • 2411.18665 • Published • 3
3D
-
GECO: Generative Image-to-3D within a SECOnd
Paper • 2405.20327 • Published • 11 -
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Paper • 2406.03184 • Published • 22 -
NPGA: Neural Parametric Gaussian Avatars
Paper • 2405.19331 • Published • 10 -
Unified Text-to-Image Generation and Retrieval
Paper • 2406.05814 • Published • 16
NeRF
-
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Paper • 2406.06133 • Published • 12 -
Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images
Paper • 2406.13393 • Published • 5 -
BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes
Paper • 2407.15848 • Published • 17 -
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Paper • 2404.01300 • Published • 4
GS
-
GFlow: Recovering 4D World from Monocular Video
Paper • 2405.18426 • Published • 17 -
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Paper • 2405.18424 • Published • 9 -
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting
Paper • 2405.15125 • Published • 8 -
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
Paper • 2405.19957 • Published • 10
3D-Edit
-
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
Paper • 2407.11394 • Published • 12 -
Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control
Paper • 2410.06985 • Published • 5 -
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion
Paper • 2410.08168 • Published • 9 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 24
3D_Diffusion
-
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Paper • 2408.10195 • Published • 13 -
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
Paper • 2408.08000 • Published • 9 -
DC3DO: Diffusion Classifier for 3D Objects
Paper • 2408.06693 • Published • 11 -
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Paper • 2408.14211 • Published • 11
Text-to-3D
-
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
Paper • 2406.16273 • Published • 43 -
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images
Paper • 2407.06191 • Published • 14 -
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
Paper • 2407.13976 • Published • 5 -
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Paper • 2411.17945 • Published • 27
Image-Gen Theoretical
-
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 30 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 59 -
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Paper • 2406.07546 • Published • 9
Image-Gen Personalization
-
pOps: Photo-Inspired Diffusion Operators
Paper • 2406.01300 • Published • 18 -
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper • 2406.06911 • Published • 12 -
Interpreting the Weight Space of Customized Diffusion Models
Paper • 2406.09413 • Published • 20 -
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Paper • 2406.09162 • Published • 14
Image-Gen Accelerator(Distill)
-
MLCM: Multistep Consistency Distillation of Latent Diffusion Model
Paper • 2406.05768 • Published • 13 -
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
Paper • 2406.14539 • Published • 27 -
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration
Paper • 2410.01723 • Published • 5 -
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Paper • 2412.02030 • Published • 19
Image-Gen Lightweight
-
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
Paper • 2405.14477 • Published • 20 -
Phased Consistency Model
Paper • 2405.18407 • Published • 48 -
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Paper • 2411.05007 • Published • 22 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 27
Image-Gen Autoregressive
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 16 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 26 -
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Paper • 2410.08159 • Published • 26
Image-Gen Edit
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 33 -
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Paper • 2406.10601 • Published • 70 -
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Paper • 2407.05282 • Published • 15 -
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Paper • 2407.16982 • Published • 42
Image-Gen
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
Image-Gen Text
-
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Paper • 2406.08392 • Published • 21 -
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Paper • 2406.10208 • Published • 22 -
AMO Sampler: Enhancing Text Rendering with Overshooting
Paper • 2411.19415 • Published • 5
Image-Gen Story
-
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper • 2407.08683 • Published • 25 -
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
Paper • 2410.06244 • Published • 19 -
Unbounded: A Generative Infinite Game of Character Life Simulation
Paper • 2410.18975 • Published • 37 -
Generative AI for Cel-Animation: A Survey
Paper • 2501.06250 • Published • 13
Image-Gen StyleInject
-
Magic Insert: Style-Aware Drag-and-Drop
Paper • 2407.02489 • Published • 22 -
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper • 2408.05492 • Published • 7 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper • 2408.16766 • Published • 18 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 39
Video-Gen
-
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Paper • 2405.20222 • Published • 11 -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
Paper • 2406.00908 • Published • 12 -
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Paper • 2406.02509 • Published • 10 -
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper • 2406.02230 • Published • 18
Video-Gen Edit
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Paper • 2405.16537 • Published • 17 -
ReVideo: Remake a Video with Motion and Content Control
Paper • 2405.13865 • Published • 25 -
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Paper • 2406.16863 • Published • 11 -
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper • 2409.13591 • Published • 17
Video_Gen lightweight
Video-Gen Customization
-
Still-Moving: Customized Video Generation without Customized Video Data
Paper • 2407.08674 • Published • 13 -
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Paper • 2408.13239 • Published • 12 -
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Paper • 2409.16160 • Published • 33 -
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Paper • 2409.17280 • Published • 11
Video_Gen Translation
Video-Gen Benchmark
Video-Gen Trajectory
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper • 2407.21705 • Published • 27 -
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Paper • 2408.11475 • Published • 18 -
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Paper • 2408.13413 • Published • 14 -
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper • 2409.18964 • Published • 26
Video-Gen LLM-based
Video-3D
-
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper • 2407.12781 • Published • 13 -
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Novel View Extrapolation with Video Diffusion Priors
Paper • 2411.14208 • Published • 10 -
World-consistent Video Diffusion with Explicit 3D Modeling
Paper • 2412.01821 • Published • 4
Video-Gen-GS
-
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Paper • 2407.11398 • Published • 10 -
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper • 2407.12781 • Published • 13 -
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
Paper • 2409.13648 • Published • 12 -
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper • 2409.20563 • Published • 9
Video-Gen Diffusion_4D(DiT etc)
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Paper • 2405.17405 • Published • 16 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12 -
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Paper • 2405.20674 • Published • 15 -
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Paper • 2406.07472 • Published • 13
Video-Gen Dataset
Video-Audio
Image-Captioning
Segmentation
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116 -
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Paper • 2408.07416 • Published • 7 -
SMITE: Segment Me In TimE
Paper • 2410.18538 • Published • 16 -
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Paper • 2410.23287 • Published • 19
SAM_based
-
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22 -
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Paper • 2408.16768 • Published • 28 -
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Paper • 2410.16268 • Published • 69
depthmap
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
Paper • 2406.12849 • Published • 50 -
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
Paper • 2407.17952 • Published • 32 -
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Paper • 2409.18124 • Published • 33
RL
-
Lessons from Learning to Spin "Pens"
Paper • 2407.18902 • Published • 21 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents
Paper • 2410.07484 • Published • 51 -
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Paper • 2410.03187 • Published • 7
Robot-related
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper • 2406.02523 • Published • 12 -
UniT: Unified Tactile Representation for Robot Learning
Paper • 2408.06481 • Published • 10 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 3 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5
Dataset
-
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads
Paper • 2407.18245 • Published • 11 -
AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark
Paper • 2409.15041 • Published • 14 -
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Paper • 2410.09732 • Published • 55 -
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Paper • 2410.13754 • Published • 75
Evaluation
Video-Gen Long
-
Training-free Long Video Generation with Chain of Diffusion Model Experts
Paper • 2408.13423 • Published • 23 -
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Paper • 2410.02757 • Published • 36 -
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
Paper • 2411.13807 • Published • 11 -
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Paper • 2411.18671 • Published • 20
Captioning
-
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Paper • 2409.18125 • Published • 34 -
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Paper • 2411.15411 • Published • 8
Omni-Generation
-
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Video-Guided Foley Sound Generation with Multimodal Controls
Paper • 2411.17698 • Published • 10 -
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Paper • 2412.01064 • Published • 46 -
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper • 2412.01169 • Published • 13
Detection