This project introduces WeTok, a powerful discrete visual tokenizer designed to resolve the long-standing conflict between compression efficiency and reconstruction fidelity. WeTok achieves state-of-the-art reconstruction quality, surpassing previous leading discrete and continuous tokenizers.
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Ying Zhang, Chen Li, Yali Wang
Shanghai Jiao Tong University, WeChat Vision (Tencent Inc.), Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), Hong Kong University of Science and Technology, Shanghai AI Laboratory
πWeTok.md@article{zhuang2026wetok, title={WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction}, author={Zhuang, Shaobin and Guo, Yiwei and Fu, Canmiao and Huang, Zhipeng and Tian, Zeyue and Zhang, Ying and Li, Chen and Wang, Yali}, journal={arXiv preprint arXiv:2508.05599}, year={2025} }
WeTok achieves a new state-of-the-art in reconstruction fidelity, surpassing both discrete and continuous tokenizers, while offering high compression ratios.
π° News
- [2025.08.08] π π π We are excited to release WeTok, a powerful discrete tokenizer featuring our novel Group-Wise Lookup-Free Quantization (GQ) and a Generative Decoder (GD). Code and pretrained models are now available!
π Implementations
π οΈ Installation
- Dependencies:
bash env.sh
Evaluation
- Evaluation on ImageNet 50K Validation Set
The dataset should be organized as follows:
imagenet
βββ val/
βββ ...
Run the 256Γ256 resolution evaluation script:
bash scripts/evaluation/imagenet_evaluation_256_dist.sh
Run the original resolution evaluation script:
bash scripts/evaluation/imagenet_evaluation_original_dist.sh
- Evaluation on MS-COCO Val2017
The dataset should be organized as follows:
MSCOCO2017
βββ val2017/
βββ ...
Run the evaluation script:
bash scripts/evaluation/mscocoval_evaluation_256_dist.sh
Run the original resolution evaluation script:
bash scripts/evaluation/mscoco_evaluation_original_dist.sh
Inference
Simply test the effect of each model reconstruction:
bash scripts/inference/reconstruct_image.sh
Qualitative comparison of 512 Γ 512 image reconstruction on TokBench.
WeTok-AR-XL generated samples at 256 Γ 256 resolution.
β€οΈ Acknowledgement
Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of Open-MAGVIT2. We also drew inspiration from the methodologies presented in LFQ, BSQ. We are grateful for their contributions to the community.