CAR CLASSIFICATION - Brand, Model & Model Year

This project is a deep learning pipeline that classifies car brand, model, and model year from a single image using a fine-tuned ConvNeXt model. It uses the Stanford Cars dataset and leverages transfer learning with facebook/convnext-large-224. Built in PyTorch, this modular and scalable pipeline supports training, evaluation, and inference.

🔍 Key Features

Download and preprocess image data from Hugging Face
Fine-tune pretrained ConvNeXt models (modern ConvNets inspired by transformers)
Track training metrics and model checkpoints
Predict the class of custom input images using saved models
Modular design for training, evaluation, and inference

🧰 Installation

🔧 Setup Instructions

Clone the repo from GitHub

git clone https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1  

cd CarClassificationTeam1

Create and activate a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Download the dataset

python-m src.data.download.download.py

📝 Requirements

Python 3.8+
PyTorch 2.3.0+cu126
torchvision 0.18.0+cu126
torchaudio 2.3.0+cu126
transformers
datasets
Other dependencies as listed in requirements.txt

🧠 Model Architecture

We fine-tuned a pretrained ConvNeXt vision transformer model:

Model: ConvNeXt-Base (224x224 resolution)
Pretrained on: ImageNet-1k
Fine-tuned on: Stanford Cars (196 classes)
Transfer Learning: Only the last two ConvNeXt stages and the classification head were trained

Since the Stanford Cars dataset contains a relatively small number of training examples (~8,100 training and ~8,000 validation images), we adopted a transfer learning strategy. The ConvNeXt model was initialized with pretrained weights from ImageNet-1k, and only the final classification head was randomly initialized and fine-tuned for our 196 target classes.

To balance generalization and training efficiency, we unfroze and trained only the last two stages of the ConvNeXt backbone (Stages 3 and 4), along with the classification head. Earlier layers remained frozen to preserve robust pretrained features.

Data Augmentation:

transforms.Compose([
    transforms.RandomResizedCrop(image_size, scale=(0.8, 1.0), ratio=(0.75, 1.33)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(degrees=15),
    transforms.RandomGrayscale(p=0.1),
    transforms.ToTensor(),
    transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)),
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3)),
    transforms.Normalize(mean=mean, std=std),
])

📊 Performance

Train Accuracy: 98.62%
Validation Accuracy: 92.30%
Train Loss (Cross Entrophy): 0.9010
Validation Loss (Cross Entrophy): 1.1231

🚀 Usage (Example)

from PIL import Image
from transformers import AutoImageProcessor, ConvNextForImageClassification
import torch

# Load model and processor
model = ConvNextForImageClassification.from_pretrained("todorristov/car_classification_model")
processor = AutoImageProcessor.from_pretrained("todorristov/car_classification_model")

# Load and preprocess image
image = Image.open("example.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Predict
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class = logits.argmax(-1).item()

print(f"Predicted class ID: {predicted_class}")

🏋️ Training Details

Framework: PyTorch
Hardware: NVIDIA RTX 4060
Epochs: 32 (early stopped training after 28 epochs)
Batch Size: 32
Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
Loss Function: Cross Entropy(label_smoothing=0.1)
Scheduler: ReduceLROnPlateau (factor=0.5, patience=2, min_lr=1e-6)

This result demonstrates the effectiveness of fine-tuning high-capacity pretrained models on medium-sized, domain-specific datasets. The model generalizes well despite visual similarities between different car models and years.

⚠️ Limitations

Trained only on 196 classes from Stanford Cars (mostly 1990–2012 U.S. models)
Poor performance on:
- Damaged or modified vehicles
- Non-standard angles or lighting
Not suitable for unseen/new car models — retraining needed

🛠 Project Details

Developed by: Todor Ristov, Goran Nikoloski, Milana Sokolova
For: TwinCar Project, Sols (Skopje, North Macedonia)
Language: Python
Framework: PyTorch
License: MIT

🔗 Resources

📚 Stanford Cars Dataset: https://huggingface.co/datasets/tanganke/stanford_cars
🤗 Model Card: https://huggingface.co/sols/car-classification-convnext
🌐 GitHub Repository: https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1
🌟 Demo Space: https://huggingface.co/spaces/todorristov/car-classification-convnext

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request. Make sure to update tests and documentation as needed.

📂 Project Structure

project_root/
│
├── images/                     # Model architecture visualizations
│
├── models/                     # Stores trained model checkpoints (e.g., best_model.pt)
│   └── best_model.pt
│
├── notebooks/                  # Jupyter notebooks for model exploration and experiments
│
├── reports/                    # Training logs (loss, accuracy, LR, time, etc.)
│
├── src/                        # Source code
│   ├── data/                   # Data-related scripts
│   │   ├── datadownloader.py   # Downloads and saves dataset to local folders
│   │   └── datatransforms.py   # Data augmentation and preprocessing transforms
│   │
│   ├── models/                 # Model utilities
│   │   └── load_model.py       # Loads model, processor, and device
│   │
│   ├── utils/                  # Utility scripts
│   │   └── save_label_map.py   # Saves class label map
│   │
│   ├── evaluate.py             # Evaluation logic per epoch
│   ├── inference.py            # Inference script for classifying new images
│   ├── train_utils.py          # Training helper functions (e.g., metric calc, logging)
│   ├── train.py                # Main training script
│   └── visualize.py            # Visualizations (e.g., confusion matrix, sample predictions)
│
├── README.md                   # Project documentation
└── requirements.txt            # Project dependencies

💬 Citation

@misc{twin-car-classification,
  title={Car Classification - Brand, Model & Model Year},
  author={Todor Ristov},
  year={2025},
  howpublished={\url{https://huggingface.co/todorristov/car_classification_model}},
  note={A deep learning pipeline for vehicle recognition.}
}

Feel free to ⭐ the repo and share your feedback!

todorristov
/

car_classification_model