Improve model card with detailed info, links, tags, and sample usage

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +86 -20
README.md CHANGED
@@ -1,36 +1,91 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
  base_model: facebook/detr-resnet-101-dc5
5
- tags:
6
- - generated_from_trainer
7
  datasets:
8
  - Voxel51/fisheye8k
 
 
 
 
 
 
 
 
 
 
9
  model-index:
10
  - name: fisheye8k_facebook_detr-resnet-101-dc5
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # fisheye8k_facebook_detr-resnet-101-dc5
18
 
19
- This model is a fine-tuned version of [facebook/detr-resnet-101-dc5](https://huggingface.co/facebook/detr-resnet-101-dc5) on the generator dataset.
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 2.6740
22
 
 
 
 
 
 
 
 
 
 
 
23
  ## Model description
24
 
25
- More information needed
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
@@ -49,14 +104,14 @@ The following hyperparameters were used during training:
49
  ### Training results
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
- |:-------------:|:-----:|:-----:|:---------------:|
53
- | 2.1508 | 1.0 | 5288 | 2.4721 |
54
- | 1.7423 | 2.0 | 10576 | 2.3029 |
55
- | 1.5881 | 3.0 | 15864 | 2.2454 |
56
- | 1.5641 | 4.0 | 21152 | 2.2912 |
57
- | 1.4438 | 5.0 | 26440 | 2.2912 |
58
- | 1.4503 | 6.0 | 31728 | 2.5056 |
59
- | 1.3487 | 7.0 | 37016 | 2.5812 |
60
  | 1.2777 | 8.0 | 42304 | 2.6740 |
61
 
62
 
@@ -67,4 +122,15 @@ The following hyperparameters were used during training:
67
  - Datasets 3.2.0
68
  - Tokenizers 0.21.0
69
 
70
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: facebook/detr-resnet-101-dc5
 
 
3
  datasets:
4
  - Voxel51/fisheye8k
5
+ library_name: transformers
6
+ license: mit
7
+ pipeline_tag: object-detection
8
+ tags:
9
+ - generated_from_trainer
10
+ - object-detection
11
+ - detr
12
+ - computer-vision
13
+ - its
14
+ - autonomous-driving
15
  model-index:
16
  - name: fisheye8k_facebook_detr-resnet-101-dc5
17
  results: []
18
  ---
19
 
 
 
 
20
  # fisheye8k_facebook_detr-resnet-101-dc5
21
 
22
+ This model is a fine-tuned version of [facebook/detr-resnet-101-dc5](https://huggingface.co/facebook/detr-resnet-101-dc5) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It is developed as part of the **Mcity Data Engine** initiative.
23
+
24
  It achieves the following results on the evaluation set:
25
  - Loss: 2.6740
26
 
27
+ ## Paper
28
+ This model was presented in the paper:
29
+ [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://arxiv.org/abs/2504.21614).
30
+
31
+ ## Project Page
32
+ For more information about the **Mcity Data Engine**, visit the official project page: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/).
33
+
34
+ ## Code
35
+ The code for the **Mcity Data Engine** is publicly available on GitHub: [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine).
36
+
37
  ## Model description
38
 
39
+ The `fisheye8k_facebook_detr-resnet-101-dc5` model is an object detection model fine-tuned for Intelligent Transportation Systems (ITS) using the DETR architecture with a ResNet-101 backbone. It is a core component of the Mcity Data Engine, an open-source system designed to address the challenges of selecting and labeling appropriate data for machine learning models, particularly for detecting long-tail and novel classes of interest in large amounts of unlabeled data from vehicle fleets and roadside perception systems. This model specifically demonstrates iterative model improvement through an open-vocabulary data selection process within this framework.
40
 
41
  ## Intended uses & limitations
42
 
43
+ **Intended Uses:**
44
+ This model is intended for research and development in the field of Intelligent Transportation Systems (ITS), specifically for object detection tasks. It is designed to identify various objects (e.g., Bus, Bike, Car, Pedestrian, Truck as per `id2label` mapping) in data collected from automotive fisheye cameras. It can be used as a foundation for developing AI algorithms that require robust object grounding and for exploring iterative model improvement techniques focusing on rare and novel classes.
45
+
46
+ **Limitations:**
47
+ * The model's performance is primarily validated on the Fisheye8K dataset and may vary when applied to other datasets or real-world scenarios with different camera types, environments, or object distributions.
48
+ * While the underlying research focuses on open-vocabulary detection and long-tail classes, generalization to entirely unseen object categories or extremely rare instances might still require further data selection and retraining within the Mcity Data Engine framework.
49
+ * The model provides bounding box predictions and class labels but does not offer instance segmentation or other more granular visual understanding capabilities.
50
+
51
+ ## Sample Usage
52
+
53
+ You can use this model with the Hugging Face `transformers` library for object detection:
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoImageProcessor, AutoModelForObjectDetection
58
+ from PIL import Image
59
+ import requests
60
+
61
+ # Load image processor and model
62
+ image_processor = AutoImageProcessor.from_pretrained("mcity-data-engine/fisheye8k_facebook_detr-resnet-101-dc5")
63
+ model = AutoModelForObjectDetection.from_pretrained("mcity-data-engine/fisheye8k_facebook_detr-resnet-101-dc5")
64
+
65
+ # Example image (replace with your image path or URL)
66
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
67
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
68
+
69
+ # Process image and get model outputs
70
+ inputs = image_processor(images=image, return_tensors="pt")
71
+ outputs = model(**inputs)
72
+
73
+ # Post-process outputs to get detected objects
74
+ target_sizes = torch.tensor([image.size[::-1]])
75
+ results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]
76
+
77
+ print("Detected objects:")
78
+ for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
79
+ box = [round(i, 2) for i in box.tolist()]
80
+ print(
81
+ f" - {model.config.id2label[label.item()]} with confidence "
82
+ f"{round(score.item(), 3)} at location {box}"
83
+ )
84
+ ```
85
 
86
  ## Training and evaluation data
87
 
88
+ This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). This dataset is specifically designed for object detection in images captured by fisheye cameras, making it highly relevant for applications in intelligent transportation systems.
89
 
90
  ## Training procedure
91
 
 
104
  ### Training results
105
 
106
  | Training Loss | Epoch | Step | Validation Loss |
107
+ |:-------------:|:-----:|:-----:|:---------------:|\
108
+ | 2.1508 | 1.0 | 5288 | 2.4721 |\
109
+ | 1.7423 | 2.0 | 10576 | 2.3029 |\
110
+ | 1.5881 | 3.0 | 15864 | 2.2454 |\
111
+ | 1.5641 | 4.0 | 21152 | 2.2912 |\
112
+ | 1.4438 | 5.0 | 26440 | 2.2912 |\
113
+ | 1.4503 | 6.0 | 31728 | 2.5056 |\
114
+ | 1.3487 | 7.0 | 37016 | 2.5812 |\
115
  | 1.2777 | 8.0 | 42304 | 2.6740 |
116
 
117
 
 
122
  - Datasets 3.2.0
123
  - Tokenizers 0.21.0
124
 
125
+ ## Citation
126
+
127
+ If you use the Mcity Data Engine or this model in your research, feel free to cite the project:
128
+
129
+ ```bibtex
130
+ @article{bogdoll2025mcitydataengine,
131
+ title={Mcity Data Engine},
132
+ author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
133
+ journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
134
+ year={2025}
135
+ }
136
+ ```