YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸš€ WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

arXiv Github Hugging Face Model

This project introduces WeTok, a powerful discrete visual tokenizer designed to resolve the long-standing conflict between compression efficiency and reconstruction fidelity. WeTok achieves state-of-the-art reconstruction quality, surpassing previous leading discrete and continuous tokenizers.

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Ying Zhang, Chen Li, Yali Wang
Shanghai Jiao Tong University, WeChat Vision (Tencent Inc.), Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), Hong Kong University of Science and Technology, Shanghai AI Laboratory
πŸ“šWeTok.md

@article{zhuang2026wetok,
  title={WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction},
  author={Zhuang, Shaobin and Guo, Yiwei and Fu, Canmiao and Huang, Zhipeng and Tian, Zeyue and Zhang, Ying and Li, Chen and Wang, Yali},
  journal={arXiv preprint arXiv:2508.05599},
  year={2025}
}


WeTok achieves a new state-of-the-art in reconstruction fidelity, surpassing both discrete and continuous tokenizers, while offering high compression ratios.

πŸ“° News

  • [2025.08.08] πŸš€ πŸš€ πŸš€ We are excited to release WeTok, a powerful discrete tokenizer featuring our novel Group-Wise Lookup-Free Quantization (GQ) and a Generative Decoder (GD). Code and pretrained models are now available!

πŸ“– Implementations

πŸ› οΈ Installation

  • Dependencies:
bash env.sh

Evaluation

  • Evaluation on ImageNet 50K Validation Set

The dataset should be organized as follows:

imagenet
└── val/
    β”œβ”€β”€ ...

Run the 256Γ—256 resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_original_dist.sh
  • Evaluation on MS-COCO Val2017

The dataset should be organized as follows:

MSCOCO2017
└── val2017/
    β”œβ”€β”€ ...

Run the evaluation script:

bash scripts/evaluation/mscocoval_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/mscoco_evaluation_original_dist.sh

Inference

Simply test the effect of each model reconstruction:

bash scripts/inference/reconstruct_image.sh


Qualitative comparison of 512 Γ— 512 image reconstruction on TokBench.


WeTok-AR-XL generated samples at 256 Γ— 256 resolution.

❀️ Acknowledgement

Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of Open-MAGVIT2. We also drew inspiration from the methodologies presented in LFQ, BSQ. We are grateful for their contributions to the community.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support