YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CAR CLASSIFICATION - Brand, Model & Model Year

This project is a deep learning pipeline that classifies car brand, model, and model year from a single image using a fine-tuned ConvNeXt model. It uses the Stanford Cars dataset and leverages transfer learning with facebook/convnext-large-224. Built in PyTorch, this modular and scalable pipeline supports training, evaluation, and inference.


πŸ” Key Features

  • Download and preprocess image data from Hugging Face
  • Fine-tune pretrained ConvNeXt models (modern ConvNets inspired by transformers)
  • Track training metrics and model checkpoints
  • Predict the class of custom input images using saved models
  • Modular design for training, evaluation, and inference

🧰 Installation

πŸ”§ Setup Instructions

  1. Clone the repo from GitHub
git clone https://github.com/Brainster-Data-Science-Academy/CarClassificationTeam1  

cd CarClassificationTeam1
  1. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download the dataset
python-m src.data.download.download.py

πŸ“ Requirements

  • Python 3.8+
  • PyTorch 2.3.0+cu126
  • torchvision 0.18.0+cu126
  • torchaudio 2.3.0+cu126
  • transformers
  • datasets
  • Other dependencies as listed in requirements.txt

🧠 Model Architecture

ConvNeXt Architecture

We fine-tuned a pretrained ConvNeXt vision transformer model:

  • Model: ConvNeXt-Base (224x224 resolution)
  • Pretrained on: ImageNet-1k
  • Fine-tuned on: Stanford Cars (196 classes)
  • Transfer Learning: Only the last two ConvNeXt stages and the classification head were trained

Since the Stanford Cars dataset contains a relatively small number of training examples (~8,100 training and ~8,000 validation images), we adopted a transfer learning strategy. The ConvNeXt model was initialized with pretrained weights from ImageNet-1k, and only the final classification head was randomly initialized and fine-tuned for our 196 target classes.

To balance generalization and training efficiency, we unfroze and trained only the last two stages of the ConvNeXt backbone (Stages 3 and 4), along with the classification head. Earlier layers remained frozen to preserve robust pretrained features.

Data Augmentation:

transforms.Compose([
    transforms.RandomResizedCrop(image_size, scale=(0.8, 1.0), ratio=(0.75, 1.33)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(degrees=15),
    transforms.RandomGrayscale(p=0.1),
    transforms.ToTensor(),
    transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)),
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3)),
    transforms.Normalize(mean=mean, std=std),
])

πŸ“Š Performance

  • Train Accuracy: 98.62%
  • Validation Accuracy: 92.30%
  • Train Loss (Cross Entrophy): 0.9010
  • Validation Loss (Cross Entrophy): 1.1231

πŸš€ Usage (Example)

from PIL import Image
from transformers import AutoImageProcessor, ConvNextForImageClassification
import torch

# Load model and processor
model = ConvNextForImageClassification.from_pretrained("todorristov/car_classification_model")
processor = AutoImageProcessor.from_pretrained("todorristov/car_classification_model")

# Load and preprocess image
image = Image.open("example.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

# Predict
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class = logits.argmax(-1).item()

print(f"Predicted class ID: {predicted_class}")

πŸ‹οΈ Training Details

  • Framework: PyTorch
  • Hardware: NVIDIA RTX 4060
  • Epochs: 32 (early stopped training after 28 epochs)
  • Batch Size: 32
  • Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
  • Loss Function: Cross Entropy(label_smoothing=0.1)
  • Scheduler: ReduceLROnPlateau (factor=0.5, patience=2, min_lr=1e-6)

Training and Validation Metrics

This result demonstrates the effectiveness of fine-tuning high-capacity pretrained models on medium-sized, domain-specific datasets. The model generalizes well despite visual similarities between different car models and years.


⚠️ Limitations

  • Trained only on 196 classes from Stanford Cars (mostly 1990–2012 U.S. models)
  • Poor performance on:
    • Damaged or modified vehicles
    • Non-standard angles or lighting
  • Not suitable for unseen/new car models β€” retraining needed

πŸ›  Project Details

  • Developed by: Todor Ristov, Goran Nikoloski, Milana Sokolova
  • For: TwinCar Project, Sols (Skopje, North Macedonia)
  • Language: Python
  • Framework: PyTorch
  • License: MIT

πŸ”— Resources


🀝 Contributing

Contributions are welcome! Please open an issue or submit a pull request. Make sure to update tests and documentation as needed.


πŸ“‚ Project Structure

project_root/
β”‚
β”œβ”€β”€ images/                     # Model architecture visualizations
β”‚
β”œβ”€β”€ models/                     # Stores trained model checkpoints (e.g., best_model.pt)
β”‚   └── best_model.pt
β”‚
β”œβ”€β”€ notebooks/                  # Jupyter notebooks for model exploration and experiments
β”‚
β”œβ”€β”€ reports/                    # Training logs (loss, accuracy, LR, time, etc.)
β”‚
β”œβ”€β”€ src/                        # Source code
β”‚   β”œβ”€β”€ data/                   # Data-related scripts
β”‚   β”‚   β”œβ”€β”€ datadownloader.py   # Downloads and saves dataset to local folders
β”‚   β”‚   └── datatransforms.py   # Data augmentation and preprocessing transforms
β”‚   β”‚
β”‚   β”œβ”€β”€ models/                 # Model utilities
β”‚   β”‚   └── load_model.py       # Loads model, processor, and device
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/                  # Utility scripts
β”‚   β”‚   └── save_label_map.py   # Saves class label map
β”‚   β”‚
β”‚   β”œβ”€β”€ evaluate.py             # Evaluation logic per epoch
β”‚   β”œβ”€β”€ inference.py            # Inference script for classifying new images
β”‚   β”œβ”€β”€ train_utils.py          # Training helper functions (e.g., metric calc, logging)
β”‚   β”œβ”€β”€ train.py                # Main training script
β”‚   └── visualize.py            # Visualizations (e.g., confusion matrix, sample predictions)
β”‚
β”œβ”€β”€ README.md                   # Project documentation
└── requirements.txt            # Project dependencies

πŸ’¬ Citation

@misc{twin-car-classification,
  title={Car Classification - Brand, Model & Model Year},
  author={Todor Ristov},
  year={2025},
  howpublished={\url{https://huggingface.co/todorristov/car_classification_model}},
  note={A deep learning pipeline for vehicle recognition.}
}

Feel free to ⭐ the repo and share your feedback!

Downloads last month
7
Safetensors
Model size
197M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using todorristov/car_classification_model 1