APRIL-AIGC commited on
Commit
1a317e1
Β·
verified Β·
1 Parent(s): 94229bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -1
README.md CHANGED
@@ -2,4 +2,103 @@
2
  base_model:
3
  - Wan-AI/Wan2.1-T2V-1.3B
4
  pipeline_tag: text-to-video
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  base_model:
3
  - Wan-AI/Wan2.1-T2V-1.3B
4
  pipeline_tag: text-to-video
5
+ ---
6
+
7
+
8
+ # UltraVideo: High-Quality UHD 4K Video Dataset
9
+
10
+ -----
11
+
12
+ <p align="center">
13
+ πŸ€“ <a href="https://xzc-zju.github.io/projects/UltraVideo/">Project</a> &nbsp&nbsp | πŸ“‘ <a href="https://arxiv.org/abs/2506.13691">Paper</a> &nbsp&nbsp | πŸ€— <a href="https://huggingface.co/datasets/APRIL-AIGC/UltraVideo">Hugging Face (UltraVideo Dataset))</a>&nbsp&nbsp | πŸ€— <a href="https://huggingface.co/datasets/APRIL-AIGC/UltraVideo-Long">Hugging Face (UltraVideo-Long Dataset))</a>&nbsp&nbsp | πŸ€— <a href="https://huggingface.co/APRIL-AIGC/UltraWan">Hugging Face (UltraWan-1K/4K Weights)</a>&nbsp&nbsp
14
+ <br>
15
+
16
+ -----
17
+
18
+ [**UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions**](https://arxiv.org/abs/2506.13691)
19
+
20
+ - πŸŽ‹ **Click below image to watch the 4K demo video.**
21
+ - πŸ€“ **First open-sourced UHD-4K/8K video datasets with comprehensive structured (10 types) captions.**
22
+ - πŸ€“ **Native 1K/4K videos generation by UltraWan.**
23
+
24
+ [![](assets/ultravideo.png)](https://www.youtube.com/watch?v=KPh62pfSHLQ)
25
+
26
+ ## TODO
27
+ - [x] Release UltraVideo-Short
28
+ - [x] Release UltraVideo-Long for long video generation and understanding.
29
+ - [ ] Release structured caption by our PPL for [Open-Sora-Plan](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0).
30
+
31
+
32
+ ## Quickstart
33
+
34
+ 1. Refer to [DiffSynth-Studio/examples/wanvideo](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo) for environment preparation.
35
+ ``` sh
36
+ pip install diffsynth==1.1.7
37
+ ```
38
+ 2. Download [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) model using huggingface-cli:
39
+ ``` sh
40
+ pip install "huggingface_hub[cli]"
41
+ huggingface-cli download --repo-type model Wan-AI/Wan2.1-T2V-1.3B --local-dir ultrawan_weights/Wan2.1-T2V-1.3B --resume-download
42
+ ```
43
+ 3. Download [UltraWan-1K/4K](https://huggingface.co/APRIL-AIGC/UltraWan) models using huggingface-cli:
44
+ ``` sh
45
+ huggingface-cli download --repo-type model APRIL-AIGC/UltraWan --local-dir ultrawan_weights/UltraWan --resume-download
46
+ ```
47
+ 4. Generate native 1K/4K videos.
48
+ ``` sh
49
+ ==> one GPU
50
+ LoRA_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
51
+ LoRA_4k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 0 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
52
+ ```
53
+ ``` sh
54
+ ==> usp with 6 GPUs
55
+ LoRA_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 1 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
56
+ LoRA_4k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 1 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
57
+ ```
58
+ 5. Official Inference
59
+ ``` sh
60
+ ==> one GPU
61
+ ori_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
62
+
63
+ ==> usp with 6 GPUs
64
+ ori_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
65
+ ```
66
+
67
+ ## UltraVideo Dataset
68
+ 1. Download [UltraVideo](https://huggingface.co/datasets/APRIL-AIGC/UltraVideo) dataset.
69
+ ``` sh
70
+ huggingface-cli download --repo-type dataset APRIL-AIGC/UltraVideo --local-dir ./UltraVideo --resume-download
71
+ ```
72
+ 2. Users must follow [LICENSE_APRIL_LAB](https://github.com/xzc-zju/UltraVideo/blob/main/license-april-lab.txt) to use this dataset.
73
+
74
+ <p align="center">
75
+ <img src="assets/dataset_comparison.png" width="600"/>
76
+ <p>
77
+
78
+ <p align="center">
79
+ <img src="assets/statistic.png" width="600"/>
80
+ <p>
81
+
82
+ ## VBench-Style Prompts of UltraVideo
83
+ The used VBench-style prompts in UltraVideo in the paper for reference:`assets/ultravideo_prompts_in_VBench_style.json`
84
+
85
+ ## License Agreement
86
+ 1. Users must follow [LICENSE_APRIL_LAB](https://github.com/xzc-zju/UltraVideo/license-april-lab.txt) to use UltraVideo dataset.
87
+ 2. Users must follow [Wan-Video/Wan2.1/LICENSE.txt](https://github.com/Wan-Video/Wan2.1/blob/main/LICENSE.txt) to use Wan-related models.
88
+
89
+
90
+ ## Acknowledgements
91
+ We would like to thank the contributors to the [Wan2.1](https://github.com/Wan-Video/Wan2.1), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open researches.
92
+
93
+ ## Citation
94
+
95
+ If you find our work helpful, please cite us.
96
+
97
+ ```
98
+ @article{ultravideo,
99
+ title={UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions},
100
+ author={Xue, Zhucun and Zhang, Jiangning and Hu, Teng and He, Haoyang and Chen, Yinan and Cai, Yuxuan and Wang, Yabiao and Wang, Chengjie and Liu, Yong and Li, Xiangtai and Tao, Dacheng},
101
+ journal={arXiv preprint arXiv:2506.13691},
102
+ year={2025}
103
+ }
104
+ ```