Commit
Β·
f41a66e
1
Parent(s):
fcee07d
Revert "fixed README.md file"
Browse filesThis reverts commit fcee07db5df946ec0de33aefeb928233cd7d89f0.
undid the README.md :wq
:wq
- README.md +11 -227
- images/convnext_architecture.png +0 -0
- reports/.gitkeep +0 -0
- reports/figures/.gitkeep +0 -0
- reports/figures/Training_Validation-Loss_Accuracy.png +0 -0
README.md
CHANGED
@@ -1,230 +1,14 @@
|
|
1 |
-
# CAR CLASSIFICATION - Brand, Model & Model Year
|
2 |
-
|
3 |
-
This project is a deep learning pipeline that classifies car **brand**, **model**, and **model year** from a single image using a fine-tuned ConvNeXt model. It uses the [Stanford Cars dataset](https://huggingface.co/datasets/tanganke/stanford_cars) and leverages **transfer learning** with `facebook/convnext-large-224`. Built in **PyTorch**, this modular and scalable pipeline supports training, evaluation, and inference.
|
4 |
-
|
5 |
-
---
|
6 |
-
|
7 |
-
## π Key Features
|
8 |
-
|
9 |
-
- Download and preprocess image data from Hugging Face
|
10 |
-
- Fine-tune pretrained ConvNeXt models (modern ConvNets inspired by transformers)
|
11 |
-
- Track training metrics and model checkpoints
|
12 |
-
- Predict the class of custom input images using saved models
|
13 |
-
- Modular design for training, evaluation, and inference
|
14 |
-
|
15 |
-
|
16 |
-
---
|
17 |
-
|
18 |
-
## π§° Installation
|
19 |
-
|
20 |
-
|
21 |
-
## π§ Setup Instructions
|
22 |
-
|
23 |
-
1. Clone the repo from GitHub
|
24 |
-
``` bash
|
25 |
-
git clone https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1
|
26 |
-
|
27 |
-
cd CarClassificationTeam1
|
28 |
-
```
|
29 |
-
|
30 |
-
2. Create and activate a virtual environment
|
31 |
-
``` bash
|
32 |
-
python -m venv venv
|
33 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
34 |
-
```
|
35 |
-
|
36 |
-
3. Install dependencies
|
37 |
-
``` bash
|
38 |
-
pip install -r requirements.txt
|
39 |
-
```
|
40 |
-
|
41 |
-
4. Download the dataset
|
42 |
-
``` bash
|
43 |
-
python-m src.data.download.download.py
|
44 |
-
```
|
45 |
-
|
46 |
-
---
|
47 |
-
|
48 |
-
## π Requirements
|
49 |
-
|
50 |
-
- Python 3.8+
|
51 |
-
- PyTorch 2.3.0+cu126
|
52 |
-
- torchvision 0.18.0+cu126
|
53 |
-
- torchaudio 2.3.0+cu126
|
54 |
-
- transformers
|
55 |
-
- datasets
|
56 |
-
- Other dependencies as listed in requirements.txt
|
57 |
-
|
58 |
-
---
|
59 |
-
|
60 |
-
## π§ Model Architecture
|
61 |
-
|
62 |
-

|
63 |
-
|
64 |
-
We fine-tuned a pretrained [ConvNeXt](https://huggingface.co/facebook/convnext-base-224) vision transformer model:
|
65 |
-
|
66 |
-
- **Model**: ConvNeXt-Base (224x224 resolution)
|
67 |
-
- **Pretrained on**: ImageNet-1k
|
68 |
-
- **Fine-tuned on**: Stanford Cars (196 classes)
|
69 |
-
- **Transfer Learning**: Only the last **two ConvNeXt stages** and the **classification head** were trained
|
70 |
-
|
71 |
-
|
72 |
-
Since the **Stanford Cars** dataset contains a relatively small number of training examples (~8,100 training and ~8,000 validation images), we adopted a **transfer learning** strategy. The ConvNeXt model was initialized with pretrained weights from ImageNet-1k, and only the final classification head was randomly initialized and fine-tuned for our 196 target classes.
|
73 |
-
|
74 |
-
To balance generalization and training efficiency, we unfroze and trained only the last two stages of the ConvNeXt backbone (Stages 3 and 4), along with the classification head. Earlier layers remained frozen to preserve robust pretrained features.
|
75 |
-
|
76 |
-
|
77 |
-
**Data Augmentation**:
|
78 |
-
|
79 |
-
```python
|
80 |
-
transforms.Compose([
|
81 |
-
transforms.RandomResizedCrop(image_size, scale=(0.8, 1.0), ratio=(0.75, 1.33)),
|
82 |
-
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
|
83 |
-
transforms.RandomHorizontalFlip(),
|
84 |
-
transforms.RandomRotation(degrees=15),
|
85 |
-
transforms.RandomGrayscale(p=0.1),
|
86 |
-
transforms.ToTensor(),
|
87 |
-
transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)),
|
88 |
-
transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3)),
|
89 |
-
transforms.Normalize(mean=mean, std=std),
|
90 |
-
])
|
91 |
-
```
|
92 |
-
|
93 |
-
---
|
94 |
-
|
95 |
-
## π Performance
|
96 |
-
|
97 |
-
- **Train Accuracy**: `98.62%`
|
98 |
-
- **Validation Accuracy**: `92.30%`
|
99 |
-
- **Train Loss (Cross Entrophy)**: `0.9010`
|
100 |
-
- **Validation Loss (Cross Entrophy)**: `1.1231`
|
101 |
-
|
102 |
-
---
|
103 |
-
|
104 |
-
## π Usage (Example)
|
105 |
-
|
106 |
-
```python
|
107 |
-
from PIL import Image
|
108 |
-
from transformers import AutoImageProcessor, ConvNextForImageClassification
|
109 |
-
import torch
|
110 |
-
|
111 |
-
# Load model and processor
|
112 |
-
model = ConvNextForImageClassification.from_pretrained("todorristov/car_classification_model")
|
113 |
-
processor = AutoImageProcessor.from_pretrained("todorristov/car_classification_model")
|
114 |
-
|
115 |
-
# Load and preprocess image
|
116 |
-
image = Image.open("example.jpg").convert("RGB")
|
117 |
-
inputs = processor(images=image, return_tensors="pt")
|
118 |
-
|
119 |
-
# Predict
|
120 |
-
with torch.no_grad():
|
121 |
-
logits = model(**inputs).logits
|
122 |
-
predicted_class = logits.argmax(-1).item()
|
123 |
-
|
124 |
-
print(f"Predicted class ID: {predicted_class}")
|
125 |
-
```
|
126 |
-
|
127 |
-
---
|
128 |
-
|
129 |
-
## ποΈ Training Details
|
130 |
-
|
131 |
-
- **Framework**: PyTorch
|
132 |
-
- **Hardware**: NVIDIA RTX 4060
|
133 |
-
- **Epochs**: 32 (early stopped training after 28 epochs)
|
134 |
-
- **Batch Size**: 32
|
135 |
-
- **Optimizer**: AdamW (lr=1e-4, weight_decay=1e-4)
|
136 |
-
- **Loss Function**: Cross Entropy(label_smoothing=0.1)
|
137 |
-
- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=2, min_lr=1e-6)
|
138 |
-
|
139 |
-

|
140 |
-
|
141 |
-
This result demonstrates the effectiveness of fine-tuning high-capacity pretrained models on medium-sized, domain-specific datasets. The model generalizes well despite visual similarities between different car models and years.
|
142 |
-
|
143 |
-
---
|
144 |
-
|
145 |
-
## β οΈ Limitations
|
146 |
-
|
147 |
-
- Trained only on 196 classes from Stanford Cars (mostly 1990β2012 U.S. models)
|
148 |
-
- Poor performance on:
|
149 |
-
- Damaged or modified vehicles
|
150 |
-
- Non-standard angles or lighting
|
151 |
-
- Not suitable for unseen/new car models β retraining needed
|
152 |
-
|
153 |
-
---
|
154 |
-
|
155 |
-
## π Project Details
|
156 |
-
|
157 |
-
- **Developed by**: Todor Ristov, Goran Nikoloski, Milana Sokolova
|
158 |
-
- **For**: TwinCar Project, Sols (Skopje, North Macedonia)
|
159 |
-
- **Language**: Python
|
160 |
-
- **Framework**: PyTorch
|
161 |
-
- **License**: [MIT](LICENSE)
|
162 |
-
|
163 |
-
---
|
164 |
-
|
165 |
-
## π Resources
|
166 |
-
|
167 |
-
- π Stanford Cars Dataset: [https://huggingface.co/datasets/tanganke/stanford\_cars](https://huggingface.co/datasets/tanganke/stanford_cars)
|
168 |
-
- π€ Model Card: [https://huggingface.co/sols/car-classification-convnext](https://huggingface.co/sols/car-classification-convnext)
|
169 |
-
- π GitHub Repository: [https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1](https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1)
|
170 |
-
- π Demo Space: [https://huggingface.co/spaces/todorristov/car-classification-convnext](https://huggingface.co/spaces/todorristov/car-classification-convnext)
|
171 |
-
|
172 |
---
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
---
|
178 |
-
## π Project Structure
|
179 |
-
|
180 |
-
```
|
181 |
-
project_root/
|
182 |
-
β
|
183 |
-
βββ images/ # Model architecture visualizations
|
184 |
-
β
|
185 |
-
βββ models/ # Stores trained model checkpoints (e.g., best_model.pt)
|
186 |
-
β βββ best_model.pt
|
187 |
-
β
|
188 |
-
βββ notebooks/ # Jupyter notebooks for model exploration and experiments
|
189 |
-
β
|
190 |
-
βββ reports/ # Training logs (loss, accuracy, LR, time, etc.)
|
191 |
-
β
|
192 |
-
βββ src/ # Source code
|
193 |
-
β βββ data/ # Data-related scripts
|
194 |
-
β β βββ datadownloader.py # Downloads and saves dataset to local folders
|
195 |
-
β β βββ datatransforms.py # Data augmentation and preprocessing transforms
|
196 |
-
β β
|
197 |
-
β βββ models/ # Model utilities
|
198 |
-
β β βββ load_model.py # Loads model, processor, and device
|
199 |
-
β β
|
200 |
-
β βββ utils/ # Utility scripts
|
201 |
-
β β βββ save_label_map.py # Saves class label map
|
202 |
-
β β
|
203 |
-
β βββ evaluate.py # Evaluation logic per epoch
|
204 |
-
β βββ inference.py # Inference script for classifying new images
|
205 |
-
β βββ train_utils.py # Training helper functions (e.g., metric calc, logging)
|
206 |
-
β βββ train.py # Main training script
|
207 |
-
β βββ visualize.py # Visualizations (e.g., confusion matrix, sample predictions)
|
208 |
-
β
|
209 |
-
βββ README.md # Project documentation
|
210 |
-
βββ requirements.txt # Project dependencies
|
211 |
-
```
|
212 |
-
|
213 |
-
---
|
214 |
-
|
215 |
-
## π¬ Citation
|
216 |
-
|
217 |
-
```
|
218 |
-
@misc{twin-car-classification,
|
219 |
-
title={Car Classification - Brand, Model & Model Year},
|
220 |
-
author={Todor Ristov},
|
221 |
-
year={2025},
|
222 |
-
howpublished={\url{https://huggingface.co/todorristov/car_classification_model}},
|
223 |
-
note={A deep learning pipeline for vehicle recognition.}
|
224 |
-
}
|
225 |
-
```
|
226 |
-
|
227 |
-
---
|
228 |
-
|
229 |
-
Feel free to β the repo and share your feedback!
|
230 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Car Classification Convnext
|
3 |
+
emoji: π
|
4 |
+
colorFrom: yellow
|
5 |
+
colorTo: blue
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.34.2
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
short_description: Car classification model trained on Stanford Cars
|
12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
images/convnext_architecture.png
DELETED
Binary file (52.9 kB)
|
|
reports/.gitkeep
DELETED
File without changes
|
reports/figures/.gitkeep
DELETED
File without changes
|
reports/figures/Training_Validation-Loss_Accuracy.png
DELETED
Binary file (58.4 kB)
|
|