CAR CLASSIFICATION - Brand, Model & Model Year
This project is a deep learning pipeline that classifies car brand, model, and model year from a single image using a fine-tuned ConvNeXt model. It uses the Stanford Cars dataset and leverages transfer learning with facebook/convnext-large-224
. Built in PyTorch, this modular and scalable pipeline supports training, evaluation, and inference.
π Key Features
- Download and preprocess image data from Hugging Face
- Fine-tune pretrained ConvNeXt models (modern ConvNets inspired by transformers)
- Track training metrics and model checkpoints
- Predict the class of custom input images using saved models
- Modular design for training, evaluation, and inference
π§° Installation
π§ Setup Instructions
- Clone the repo from GitHub
git clone https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1
cd CarClassificationTeam1
- Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Download the dataset
python-m src.data.download.download.py
π Requirements
- Python 3.8+
- PyTorch 2.3.0+cu126
- torchvision 0.18.0+cu126
- torchaudio 2.3.0+cu126
- transformers
- datasets
- Other dependencies as listed in requirements.txt
π§ Model Architecture
We fine-tuned a pretrained ConvNeXt vision transformer model:
- Model: ConvNeXt-Base (224x224 resolution)
- Pretrained on: ImageNet-1k
- Fine-tuned on: Stanford Cars (196 classes)
- Transfer Learning: Only the last two ConvNeXt stages and the classification head were trained
Since the Stanford Cars dataset contains a relatively small number of training examples (~8,100 training and ~8,000 validation images), we adopted a transfer learning strategy. The ConvNeXt model was initialized with pretrained weights from ImageNet-1k, and only the final classification head was randomly initialized and fine-tuned for our 196 target classes.
To balance generalization and training efficiency, we unfroze and trained only the last two stages of the ConvNeXt backbone (Stages 3 and 4), along with the classification head. Earlier layers remained frozen to preserve robust pretrained features.
Data Augmentation:
transforms.Compose([
transforms.RandomResizedCrop(image_size, scale=(0.8, 1.0), ratio=(0.75, 1.33)),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(degrees=15),
transforms.RandomGrayscale(p=0.1),
transforms.ToTensor(),
transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)),
transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3)),
transforms.Normalize(mean=mean, std=std),
])
π Performance
- Train Accuracy:
98.62%
- Validation Accuracy:
92.30%
- Train Loss (Cross Entrophy):
0.9010
- Validation Loss (Cross Entrophy):
1.1231
π Usage (Example)
from PIL import Image
from transformers import AutoImageProcessor, ConvNextForImageClassification
import torch
# Load model and processor
model = ConvNextForImageClassification.from_pretrained("todorristov/car_classification_model")
processor = AutoImageProcessor.from_pretrained("todorristov/car_classification_model")
# Load and preprocess image
image = Image.open("example.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
# Predict
with torch.no_grad():
logits = model(**inputs).logits
predicted_class = logits.argmax(-1).item()
print(f"Predicted class ID: {predicted_class}")
ποΈ Training Details
- Framework: PyTorch
- Hardware: NVIDIA RTX 4060
- Epochs: 32 (early stopped training after 28 epochs)
- Batch Size: 32
- Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
- Loss Function: Cross Entropy(label_smoothing=0.1)
- Scheduler: ReduceLROnPlateau (factor=0.5, patience=2, min_lr=1e-6)
This result demonstrates the effectiveness of fine-tuning high-capacity pretrained models on medium-sized, domain-specific datasets. The model generalizes well despite visual similarities between different car models and years.
β οΈ Limitations
- Trained only on 196 classes from Stanford Cars (mostly 1990β2012 U.S. models)
- Poor performance on:
- Damaged or modified vehicles
- Non-standard angles or lighting
- Not suitable for unseen/new car models β retraining needed
π Project Details
- Developed by: Todor Ristov, Goran Nikoloski, Milana Sokolova
- For: TwinCar Project, Sols (Skopje, North Macedonia)
- Language: Python
- Framework: PyTorch
- License: MIT
π Resources
- π Stanford Cars Dataset: https://huggingface.co/datasets/tanganke/stanford_cars
- π€ Model Card: https://huggingface.co/sols/car-classification-convnext
- π GitHub Repository: https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1
- π Demo Space: https://huggingface.co/spaces/todorristov/car-classification-convnext
π€ Contributing
Contributions are welcome! Please open an issue or submit a pull request. Make sure to update tests and documentation as needed.
π Project Structure
project_root/
β
βββ images/ # Model architecture visualizations
β
βββ models/ # Stores trained model checkpoints (e.g., best_model.pt)
β βββ best_model.pt
β
βββ notebooks/ # Jupyter notebooks for model exploration and experiments
β
βββ reports/ # Training logs (loss, accuracy, LR, time, etc.)
β
βββ src/ # Source code
β βββ data/ # Data-related scripts
β β βββ datadownloader.py # Downloads and saves dataset to local folders
β β βββ datatransforms.py # Data augmentation and preprocessing transforms
β β
β βββ models/ # Model utilities
β β βββ load_model.py # Loads model, processor, and device
β β
β βββ utils/ # Utility scripts
β β βββ save_label_map.py # Saves class label map
β β
β βββ evaluate.py # Evaluation logic per epoch
β βββ inference.py # Inference script for classifying new images
β βββ train_utils.py # Training helper functions (e.g., metric calc, logging)
β βββ train.py # Main training script
β βββ visualize.py # Visualizations (e.g., confusion matrix, sample predictions)
β
βββ README.md # Project documentation
βββ requirements.txt # Project dependencies
π¬ Citation
@misc{twin-car-classification,
title={Car Classification - Brand, Model & Model Year},
author={Todor Ristov},
year={2025},
howpublished={\url{https://huggingface.co/todorristov/car_classification_model}},
note={A deep learning pipeline for vehicle recognition.}
}
Feel free to β the repo and share your feedback!
- Downloads last month
- 7