Image-Text-to-Text
Transformers
conversational

Bagel‑Zebra‑CoT

A vision–language model fine‑tuned on the Zebra‑CoT dataset to generate high-quality interleaved visual chain‑of‑thought.

Paper on ArXiv Dataset on Hugging Face Model on Hugging Face GitHub

Bagel-Zebra-CoT Example Trace


Table of Contents


Model Description

Bagel‑Zebra‑CoT is fine-tuned from Bagel‑7B on the Zebra‑CoT. The model is trained to generate interleaved text and image traces inherently during its own reasoning process.


Usage

For interleaved text and image inference and training with our model, please refer to our GitHub repository.

For general information and other details, please refer to the offical Bagel GitHub repository.


Dataset

  • Zebra‑CoT: 182,384 interleaved text‑image reasoning samples across 18 sub‑tasks in 4 categories (2D visual, 3D visual, scientific reasoning, visual logic & strategic games).

License

Bagel‑Zebra‑CoT is licensed under the Apache 2.0 license. It is finetuned from ByteDance-Seed/BAGEL-7B-MoT, which was finetuned from Qwen2.5-7B-Instruct and siglip-so400m-14-384-flash-attn2 model, and uses the FLUX.1-schnell VAE model, all under Apache 2.0.


Citation

If you use this model, please cite:

@misc{li2025zebracot,
      title={Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning}, 
      author={Ang Li and Charles Wang and Kaiyu Yue and Zikui Cai and Ollie Liu and Deqing Fu and Peng Guo and Wang Bill Zhu and Vatsal Sharan and Robin Jia and Willie Neiswanger and Furong Huang and Tom Goldstein and Micah Goldblum},
      year={2025},
      eprint={2507.16746},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.16746}, 
}

Links


Downloads last month
100
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for multimodal-reasoning-lab/Bagel-Zebra-CoT

Base model

Qwen/Qwen2.5-7B
Finetuned
(3)
this model

Dataset used to train multimodal-reasoning-lab/Bagel-Zebra-CoT

Collection including multimodal-reasoning-lab/Bagel-Zebra-CoT