You need to agree to share your contact information to access this model
The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy.
Log in or Sign Up to review the conditions and access this model content.
VGGT: Visual Geometry Grounded Transformer
Meta AI Research; University of Oxford, VGG
Jianyuan Wang, Minghao Chen, Nikita Karaev,
Andrea Vedaldi, Christian Rupprecht, David Novotny
This Hugging Face repository provides a model checkpoint licensed for commercial use, with the exception of military applications. Refer to the LICENSE file for full terms.
Overview
Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds.
Quick Start
Please refer to our Github Repo
Citation
If you find our repository useful, please consider giving it a star โญ and citing our paper in your work:
@inproceedings{wang2025vggt,
title={VGGT: Visual Geometry Grounded Transformer},
author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}