lijincheng
commited on
Commit
·
eee10f0
1
Parent(s):
67444c6
push custom data
Browse files
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
custom_data/** filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -9,7 +9,7 @@ Jincheng Li*, Chunyu Xie*, Ji Ao, Dawei Lengâ , Yuhui Yin (*Equal Contributi
|
|
9 |
|
10 |
|
11 |
## 🔥 News
|
12 |
-
- 🚀 **[2025/
|
13 |
- 🚀 **[2025/07/24]** We released the paper of [LMM-Det: Make Large Multimodal Models Excel in Object Detection](https://arxiv.org/abs/2507.18300).
|
14 |
- 🚀 **[2025/06/26]** LMM-Det has been accepted by ICCV'25.
|
15 |
|
|
|
9 |
|
10 |
|
11 |
## 🔥 News
|
12 |
+
- 🚀 **[2025/08/01]** We have updated the LMM-Det github repository, and now you can test our models!
|
13 |
- 🚀 **[2025/07/24]** We released the paper of [LMM-Det: Make Large Multimodal Models Excel in Object Detection](https://arxiv.org/abs/2507.18300).
|
14 |
- 🚀 **[2025/06/26]** LMM-Det has been accepted by ICCV'25.
|
15 |
|
custom_data/custom_data.md
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Data Curation
|
2 |
+
|
3 |
+
In Stage IV, we curate a customized dataset to make LMM-Det excel in object detection while preserving its inherent capabilities like caption generation and VQA.
|
4 |
+
|
5 |
+
## Step 1
|
6 |
+
|
7 |
+
We generate pesudo labels on the trainset of COCO using [Salience-DETR](https://github.com/xiuqhou/Salience-DETR) (FocalNet-L backone), and re-organize them into a instruction format. Note that the re-organization data consists of ground-truth labels and pesudo labels.
|
8 |
+
(In practice, this data is aslo used in Stage III.)
|
9 |
+
|
10 |
+
## Step 2
|
11 |
+
|
12 |
+
We remove the textcaps data in the LLaVA-665K instruction data.
|
13 |
+
|
14 |
+
## Step 3
|
15 |
+
|
16 |
+
We concat the the re-organization data and the LLaVA-665K instruction data (without textcaps) as the training data in Stage IV.
|
custom_data/llava_665k_owlv2_pad_rm_textcaps_w_coco_reorganized_for_stage4.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:45e7f79788fd0acf67bdf598ac184c5798907c1f7e58cc83d1d9ea123df67b0b
|
3 |
+
size 1306355879
|