Training CNN with PyTorch for CIFAR-10

Aug 7, 2024

Training a Convolutional Neural Network with PyTorch

Introduction

  • Objective: Train a convolutional neural network (CNN) using PyTorch to classify images from the CIFAR-10 dataset.
  • CIFAR-10 Classes: Dogs, Cats, Horses, Ships, Trucks, Cars, etc.

Prerequisites

  • Install necessary packages:
    • numpy
    • Pillow
    • torch
    • torchvision

Setting Up the Environment

  • Use Jupyter Notebook for ease of rerunning individual cells.

Data Preparation

  1. Import Libraries:

    • numpy, PIL, torch, torch.nn, torch.optim, torchvision, torchvision.transforms
  2. Data Transformations:

    • Scale image data (0-255) to (-1, 1).
    • Use transforms.Compose to combine transformations:
      • transforms.ToTensor
      • transforms.Normalize(mean=0.5, std=0.5)
  3. Load the CIFAR-10 Dataset:

    • Specify data directory and set options to load training and test datasets:
      • train=True for training dataset, train=False for test dataset.
      • Set download=True to download the dataset if not present.
    • Create DataLoaders for training and testing with a batch size of 32 and shuffling enabled.

Understanding the Data

  • Inspect the shape of the image data:
    • Expected shape: 3 (RGB channels) x 32 x 32 pixels.
  • Class labels:
    • List of class names: Plane, Car, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.

Building the Convolutional Neural Network (CNN)

  1. Define the CNN Class:

    • Extend from nn.Module.
    • Implement the __init__ method (Constructor) to define the network architecture:
      • Convolutional Layers:
        • First Layer: nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5)
        • Apply Max Pooling: nn.MaxPool2d(kernel_size=2)
        • Second Layer: nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5)
      • Flatten the output for fully connected layers.
      • Define fully connected layers (Dense layers):
        • Input: 24 x 5 x 5
        • Hidden Layers: (120 neurons, 84 neurons)
        • Output Layer: 10 neurons (for classification)
  2. Forward Method:

    • Pass input through the network:
      • Convolution + ReLU + Pooling layers.
      • Flatten before entering fully connected layers.

Training the Model

  1. Define Loss Function and Optimizer:

    • Loss Function: Categorical Cross Entropy nn.CrossEntropyLoss().
    • Optimizer: Stochastic Gradient Descent (SGD) with a learning rate of 0.001 and momentum of 0.9.
  2. Training Loop:

    • Iterate through epochs (e.g., 30 epochs).
    • Calculate running loss, perform backpropagation, and update weights.
    • Monitor loss to ensure it decreases over epochs.
  3. Save Model Parameters:

    • Save the trained model's state dictionary for future use.

Evaluation

  1. Test the Model:

    • Set the model to evaluation mode and disable gradient computation.
    • Evaluate on the test dataset and calculate accuracy:
      • Count correct predictions vs. total predictions.
  2. Predict on New Images:

    • Load new images and apply necessary transformations:
      • Resize to 32 x 32 pixels.
    • Get predictions for new images and output class labels.

Conclusion

  • Model can classify images accurately (e.g., achieved ~68% accuracy on test data).
  • Future work: Explore how to deploy the model as an application on a server.