Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,103 @@
|
|
2 |
base_model:
|
3 |
- Wan-AI/Wan2.1-T2V-1.3B
|
4 |
pipeline_tag: text-to-video
|
5 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- Wan-AI/Wan2.1-T2V-1.3B
|
4 |
pipeline_tag: text-to-video
|
5 |
+
---
|
6 |
+
|
7 |
+
|
8 |
+
# UltraVideo: High-Quality UHD 4K Video Dataset
|
9 |
+
|
10 |
+
-----
|
11 |
+
|
12 |
+
<p align="center">
|
13 |
+
π€ <a href="https://xzc-zju.github.io/projects/UltraVideo/">Project</a>    | π <a href="https://arxiv.org/abs/2506.13691">Paper</a>    | π€ <a href="https://huggingface.co/datasets/APRIL-AIGC/UltraVideo">Hugging Face (UltraVideo Dataset))</a>   | π€ <a href="https://huggingface.co/datasets/APRIL-AIGC/UltraVideo-Long">Hugging Face (UltraVideo-Long Dataset))</a>   | π€ <a href="https://huggingface.co/APRIL-AIGC/UltraWan">Hugging Face (UltraWan-1K/4K Weights)</a>  
|
14 |
+
<br>
|
15 |
+
|
16 |
+
-----
|
17 |
+
|
18 |
+
[**UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions**](https://arxiv.org/abs/2506.13691)
|
19 |
+
|
20 |
+
- π **Click below image to watch the 4K demo video.**
|
21 |
+
- π€ **First open-sourced UHD-4K/8K video datasets with comprehensive structured (10 types) captions.**
|
22 |
+
- π€ **Native 1K/4K videos generation by UltraWan.**
|
23 |
+
|
24 |
+
[](https://www.youtube.com/watch?v=KPh62pfSHLQ)
|
25 |
+
|
26 |
+
## TODO
|
27 |
+
- [x] Release UltraVideo-Short
|
28 |
+
- [x] Release UltraVideo-Long for long video generation and understanding.
|
29 |
+
- [ ] Release structured caption by our PPL for [Open-Sora-Plan](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0).
|
30 |
+
|
31 |
+
|
32 |
+
## Quickstart
|
33 |
+
|
34 |
+
1. Refer to [DiffSynth-Studio/examples/wanvideo](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo) for environment preparation.
|
35 |
+
``` sh
|
36 |
+
pip install diffsynth==1.1.7
|
37 |
+
```
|
38 |
+
2. Download [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) model using huggingface-cli:
|
39 |
+
``` sh
|
40 |
+
pip install "huggingface_hub[cli]"
|
41 |
+
huggingface-cli download --repo-type model Wan-AI/Wan2.1-T2V-1.3B --local-dir ultrawan_weights/Wan2.1-T2V-1.3B --resume-download
|
42 |
+
```
|
43 |
+
3. Download [UltraWan-1K/4K](https://huggingface.co/APRIL-AIGC/UltraWan) models using huggingface-cli:
|
44 |
+
``` sh
|
45 |
+
huggingface-cli download --repo-type model APRIL-AIGC/UltraWan --local-dir ultrawan_weights/UltraWan --resume-download
|
46 |
+
```
|
47 |
+
4. Generate native 1K/4K videos.
|
48 |
+
``` sh
|
49 |
+
==> one GPU
|
50 |
+
LoRA_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
|
51 |
+
LoRA_4k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 0 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
|
52 |
+
```
|
53 |
+
``` sh
|
54 |
+
==> usp with 6 GPUs
|
55 |
+
LoRA_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 1 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
|
56 |
+
LoRA_4k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 1 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
|
57 |
+
```
|
58 |
+
5. Official Inference
|
59 |
+
``` sh
|
60 |
+
==> one GPU
|
61 |
+
ori_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
|
62 |
+
|
63 |
+
==> usp with 6 GPUs
|
64 |
+
ori_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
|
65 |
+
```
|
66 |
+
|
67 |
+
## UltraVideo Dataset
|
68 |
+
1. Download [UltraVideo](https://huggingface.co/datasets/APRIL-AIGC/UltraVideo) dataset.
|
69 |
+
``` sh
|
70 |
+
huggingface-cli download --repo-type dataset APRIL-AIGC/UltraVideo --local-dir ./UltraVideo --resume-download
|
71 |
+
```
|
72 |
+
2. Users must follow [LICENSE_APRIL_LAB](https://github.com/xzc-zju/UltraVideo/blob/main/license-april-lab.txt) to use this dataset.
|
73 |
+
|
74 |
+
<p align="center">
|
75 |
+
<img src="assets/dataset_comparison.png" width="600"/>
|
76 |
+
<p>
|
77 |
+
|
78 |
+
<p align="center">
|
79 |
+
<img src="assets/statistic.png" width="600"/>
|
80 |
+
<p>
|
81 |
+
|
82 |
+
## VBench-Style Prompts of UltraVideo
|
83 |
+
The used VBench-style prompts in UltraVideo in the paper for reference:`assets/ultravideo_prompts_in_VBench_style.json`
|
84 |
+
|
85 |
+
## License Agreement
|
86 |
+
1. Users must follow [LICENSE_APRIL_LAB](https://github.com/xzc-zju/UltraVideo/license-april-lab.txt) to use UltraVideo dataset.
|
87 |
+
2. Users must follow [Wan-Video/Wan2.1/LICENSE.txt](https://github.com/Wan-Video/Wan2.1/blob/main/LICENSE.txt) to use Wan-related models.
|
88 |
+
|
89 |
+
|
90 |
+
## Acknowledgements
|
91 |
+
We would like to thank the contributors to the [Wan2.1](https://github.com/Wan-Video/Wan2.1), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open researches.
|
92 |
+
|
93 |
+
## Citation
|
94 |
+
|
95 |
+
If you find our work helpful, please cite us.
|
96 |
+
|
97 |
+
```
|
98 |
+
@article{ultravideo,
|
99 |
+
title={UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions},
|
100 |
+
author={Xue, Zhucun and Zhang, Jiangning and Hu, Teng and He, Haoyang and Chen, Yinan and Cai, Yuxuan and Wang, Yabiao and Wang, Chengjie and Liu, Yong and Li, Xiangtai and Tao, Dacheng},
|
101 |
+
journal={arXiv preprint arXiv:2506.13691},
|
102 |
+
year={2025}
|
103 |
+
}
|
104 |
+
```
|