tencent/HunyuanWorld-Voyager

We introduce HunyuanWorld-Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. It can also jointly generate aligned depth and RGB video for effective and direct 3D reconstruction.

🔗 BibTeX

If you find Voyager useful for your research and applications, please cite using this BibTeX:

@article{huang2025voyager,
  title={Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation},
  author={Huang, Tianyu and Zheng, Wangguandong and Wang, Tengfei and Liu, Yuhao and Wang, Zhenwei and Wu, Junta and Jiang, Jie and Li, Hui and Lau, Rynson WH and Zuo, Wangmeng and Guo, Chunchao},
  journal={arXiv preprint arXiv:2506.04225},
  year={2025}
}

Acknowledgements

We would like to thank HunyuanWorld, Hunyuan3D-2, and HunyuanVideo-I2V. We also thank VGGT, MoGE, Metric3D, for their open research and exploration.