RangiLyu commited on
Commit
2cab00c
·
unverified ·
1 Parent(s): 09af1e3

tech-report link (#15)

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -1,10 +1,10 @@
1
  # 🤖 Multi-modal GPT
2
 
3
- Train a multi-modal chatbot with visual and language instructions!
4
 
5
  Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.
6
 
7
- The **joint training** of visual and language instructions effectively improves the performance of the model!
8
 
9
  Welcome to join us!
10
 
@@ -178,3 +178,16 @@ torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
178
  - [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
179
  - [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
180
  - [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # 🤖 Multi-modal GPT
2
 
3
+ Train a multi-modal chatbot with visual and language instructions!
4
 
5
  Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.
6
 
7
+ The **joint training** of visual and language instructions effectively improves the performance of the model! For more details please refer to our [technical report](https://arxiv.org/abs/2305.04790).
8
 
9
  Welcome to join us!
10
 
 
178
  - [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
179
  - [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
180
  - [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
181
+
182
+ If you find our project useful for your research and applications, please cite using this BibTeX:
183
+
184
+ ```bibtex
185
+ @misc{gong2023multimodalgpt,
186
+ title={MultiModal-GPT: A Vision and Language Model for Dialogue with Humans},
187
+ author={Tao Gong and Chengqi Lyu and Shilong Zhang and Yudong Wang and Miao Zheng and Qian Zhao and Kuikun Liu and Wenwei Zhang and Ping Luo and Kai Chen},
188
+ year={2023},
189
+ eprint={2305.04790},
190
+ archivePrefix={arXiv},
191
+ primaryClass={cs.CV}
192
+ }
193
+ ```