Building a Neural Network from Scratch
Introduction
- Presenter: Samson
- Topic: Building a neural network from scratch (without TensorFlow and Keras, using only NumPy)
- Focus: Implementing equations, linear algebra, and understanding neural networks at a deeper level
Problem Overview
- Task: Digit classification using the MNIST dataset
- MNIST Dataset: 28x28 grayscale images of handwritten digits (0-9)
- Goal: Build a neural network to classify the images
Data Representation
- Images: Each image is 28x28 pixels, linearized into 784 pixels
- Pixel Values: Ranges between 0 (black) to 255 (white)
- Data Matrix: Transposed so that columns correspond to examples and rows to pixels
Neural Network Architecture
- Input Layer: 784 nodes (one for each pixel)
- Hidden Layer: 10 units
- Output Layer: 10 units (each node representing a digit from 0 to 9)
- Layer Naming:
- Input Layer
- First Hidden Layer
- Second Hidden Layer/Output Layer
Training Phases
Forward Propagation
- Step: Calculate final output starting from the input layer
- Variables: A0 (input), Z1 (linear combination with weights and biases), A1 (activated with ReLU)
- Weights (W) and Biases (B): Applied iteratively
- Activation Function: Non-linear function (ReLU) applied to results
- Output Activation: Softmax (probabilities for class prediction)
Back Propagation
- Error Calculation: Evaluate error in predictions (difference between predicted and actual)
- Weight & Bias Adjustment: Using the computed error, adjust weights and biases through derived relationships
- Learning Rate (α): Determines the update magnitude (hyperparameter)
- Update Equations:
- W1 = W1 - α * dW1
- B1 = B1 - α * dB1
- Same for W2 and B2
Implementation
Libraries and Data Preparation
- Libraries: NumPy, Pandas
- Data Loading: Using Pandas to read CSV and convert to NumPy arrays
- Data Splitting: Training data (actual learning) and Dev data (cross-validation/testing)
- Shuffling and Transposing: Ensuring each column represents an example
Functions and Layers Implementation
- Parameter Initialization: Function to initialize W1, W2, B1, B2 with random values
- Forward Propagation: Function to carry signals from input to output using weights, biases, and activation functions
- Relu and Softmax Functions: Defined for activation and output layers
- Back Propagation: Function to compute gradients and update weights and biases
- Update Parameters: Adjust weights and biases based on backprop results
- Gradient Descent: Main loop that iterates, calling forward and backpropagation, updating weights/biases
Coding in Kaggle
- Environment: Used Kaggle notebooks for coding, leveraging easy data handling
- Implementation of Functions: Step-by-step coding of forward prop, back prop, and parameter updates
- Iterative Training: Running multiple iterations to minimize error and adjust weights
- Debugging: Adjusting code to handle initialization issues and correct function integration
Model Performance
- Training Accuracy: ~84% accuracy on training data within 30 minutes
- Validation Accuracy: ~85.5% accuracy on cross-validation data (effectively testing data)
- Error Analysis: Examples where the model misclassifies (e.g., tricky digits like rotated numbers)
Conclusion
- Satisfaction: High satisfaction in creating, debugging, and seeing model accuracy improve
- Next Steps for Enhancement: Trying more layers, regularization, different optimization methods (e.g., momentum, RMSProp, Adam)
- Resources: Link to blog post and notebook for further exploration
- Final Thoughts: Building from scratch gives deep understanding and satisfaction compared to using higher-level libraries
Thank You: Thanks for watching and engaging!