Safetensors
lijincheng commited on
Commit
eee10f0
·
1 Parent(s): 67444c6

push custom data

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ custom_data/** filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -9,7 +9,7 @@ Jincheng Li*, Chunyu Xie*, Ji Ao, Dawei Leng†, Yuhui Yin (*Equal Contributi
9
 
10
 
11
  ## 🔥 News
12
- - 🚀 **[2025/07/31]** We have updated the LMM-Det github repository, and now you can test our models!
13
  - 🚀 **[2025/07/24]** We released the paper of [LMM-Det: Make Large Multimodal Models Excel in Object Detection](https://arxiv.org/abs/2507.18300).
14
  - 🚀 **[2025/06/26]** LMM-Det has been accepted by ICCV'25.
15
 
 
9
 
10
 
11
  ## 🔥 News
12
+ - 🚀 **[2025/08/01]** We have updated the LMM-Det github repository, and now you can test our models!
13
  - 🚀 **[2025/07/24]** We released the paper of [LMM-Det: Make Large Multimodal Models Excel in Object Detection](https://arxiv.org/abs/2507.18300).
14
  - 🚀 **[2025/06/26]** LMM-Det has been accepted by ICCV'25.
15
 
custom_data/custom_data.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Data Curation
2
+
3
+ In Stage IV, we curate a customized dataset to make LMM-Det excel in object detection while preserving its inherent capabilities like caption generation and VQA.
4
+
5
+ ## Step 1
6
+
7
+ We generate pesudo labels on the trainset of COCO using [Salience-DETR](https://github.com/xiuqhou/Salience-DETR) (FocalNet-L backone), and re-organize them into a instruction format. Note that the re-organization data consists of ground-truth labels and pesudo labels.
8
+ (In practice, this data is aslo used in Stage III.)
9
+
10
+ ## Step 2
11
+
12
+ We remove the textcaps data in the LLaVA-665K instruction data.
13
+
14
+ ## Step 3
15
+
16
+ We concat the the re-organization data and the LLaVA-665K instruction data (without textcaps) as the training data in Stage IV.
custom_data/llava_665k_owlv2_pad_rm_textcaps_w_coco_reorganized_for_stage4.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45e7f79788fd0acf67bdf598ac184c5798907c1f7e58cc83d1d9ea123df67b0b
3
+ size 1306355879