Spaces:
Build error
Build error
tech-report link (#15)
Browse files
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
# 🤖 Multi-modal GPT
|
2 |
|
3 |
-
Train a multi-modal chatbot with visual and language instructions!
|
4 |
|
5 |
Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.
|
6 |
|
7 |
-
The **joint training** of visual and language instructions effectively improves the performance of the model!
|
8 |
|
9 |
Welcome to join us!
|
10 |
|
@@ -178,3 +178,16 @@ torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
|
|
178 |
- [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
|
179 |
- [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
|
180 |
- [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# 🤖 Multi-modal GPT
|
2 |
|
3 |
+
Train a multi-modal chatbot with visual and language instructions!
|
4 |
|
5 |
Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.
|
6 |
|
7 |
+
The **joint training** of visual and language instructions effectively improves the performance of the model! For more details please refer to our [technical report](https://arxiv.org/abs/2305.04790).
|
8 |
|
9 |
Welcome to join us!
|
10 |
|
|
|
178 |
- [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
|
179 |
- [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
|
180 |
- [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
|
181 |
+
|
182 |
+
If you find our project useful for your research and applications, please cite using this BibTeX:
|
183 |
+
|
184 |
+
```bibtex
|
185 |
+
@misc{gong2023multimodalgpt,
|
186 |
+
title={MultiModal-GPT: A Vision and Language Model for Dialogue with Humans},
|
187 |
+
author={Tao Gong and Chengqi Lyu and Shilong Zhang and Yudong Wang and Miao Zheng and Qian Zhao and Kuikun Liu and Wenwei Zhang and Ping Luo and Kai Chen},
|
188 |
+
year={2023},
|
189 |
+
eprint={2305.04790},
|
190 |
+
archivePrefix={arXiv},
|
191 |
+
primaryClass={cs.CV}
|
192 |
+
}
|
193 |
+
```
|