🧠

Building a Neural Network from Scratch

Jul 20, 2024

Building a Neural Network from Scratch

Introduction

  • Presenter: Samson
  • Topic: Building a neural network from scratch (without TensorFlow and Keras, using only NumPy)
  • Focus: Implementing equations, linear algebra, and understanding neural networks at a deeper level

Problem Overview

  • Task: Digit classification using the MNIST dataset
  • MNIST Dataset: 28x28 grayscale images of handwritten digits (0-9)
  • Goal: Build a neural network to classify the images

Data Representation

  • Images: Each image is 28x28 pixels, linearized into 784 pixels
  • Pixel Values: Ranges between 0 (black) to 255 (white)
  • Data Matrix: Transposed so that columns correspond to examples and rows to pixels

Neural Network Architecture

  • Input Layer: 784 nodes (one for each pixel)
  • Hidden Layer: 10 units
  • Output Layer: 10 units (each node representing a digit from 0 to 9)
  • Layer Naming:
    • Input Layer
    • First Hidden Layer
    • Second Hidden Layer/Output Layer

Training Phases

Forward Propagation

  • Step: Calculate final output starting from the input layer
  • Variables: A0 (input), Z1 (linear combination with weights and biases), A1 (activated with ReLU)
  • Weights (W) and Biases (B): Applied iteratively
  • Activation Function: Non-linear function (ReLU) applied to results
  • Output Activation: Softmax (probabilities for class prediction)

Back Propagation

  • Error Calculation: Evaluate error in predictions (difference between predicted and actual)
  • Weight & Bias Adjustment: Using the computed error, adjust weights and biases through derived relationships
  • Learning Rate (α): Determines the update magnitude (hyperparameter)
  • Update Equations:
    • W1 = W1 - α * dW1
    • B1 = B1 - α * dB1
    • Same for W2 and B2

Implementation

Libraries and Data Preparation

  • Libraries: NumPy, Pandas
  • Data Loading: Using Pandas to read CSV and convert to NumPy arrays
  • Data Splitting: Training data (actual learning) and Dev data (cross-validation/testing)
  • Shuffling and Transposing: Ensuring each column represents an example

Functions and Layers Implementation

  • Parameter Initialization: Function to initialize W1, W2, B1, B2 with random values
  • Forward Propagation: Function to carry signals from input to output using weights, biases, and activation functions
  • Relu and Softmax Functions: Defined for activation and output layers
  • Back Propagation: Function to compute gradients and update weights and biases
  • Update Parameters: Adjust weights and biases based on backprop results
  • Gradient Descent: Main loop that iterates, calling forward and backpropagation, updating weights/biases

Coding in Kaggle

  • Environment: Used Kaggle notebooks for coding, leveraging easy data handling
  • Implementation of Functions: Step-by-step coding of forward prop, back prop, and parameter updates
  • Iterative Training: Running multiple iterations to minimize error and adjust weights
  • Debugging: Adjusting code to handle initialization issues and correct function integration

Model Performance

  • Training Accuracy: ~84% accuracy on training data within 30 minutes
  • Validation Accuracy: ~85.5% accuracy on cross-validation data (effectively testing data)
  • Error Analysis: Examples where the model misclassifies (e.g., tricky digits like rotated numbers)

Conclusion

  • Satisfaction: High satisfaction in creating, debugging, and seeing model accuracy improve
  • Next Steps for Enhancement: Trying more layers, regularization, different optimization methods (e.g., momentum, RMSProp, Adam)
  • Resources: Link to blog post and notebook for further exploration
  • Final Thoughts: Building from scratch gives deep understanding and satisfaction compared to using higher-level libraries

Thank You: Thanks for watching and engaging!