Building a Neural Network from Scratch

Introduction

Presenter: Samson
Topic: Building a neural network from scratch (without TensorFlow and Keras, using only NumPy)
Focus: Implementing equations, linear algebra, and understanding neural networks at a deeper level

Images: Each image is 28x28 pixels, linearized into 784 pixels
Pixel Values: Ranges between 0 (black) to 255 (white)
Data Matrix: Transposed so that columns correspond to examples and rows to pixels

Input Layer: 784 nodes (one for each pixel)
Hidden Layer: 10 units
Output Layer: 10 units (each node representing a digit from 0 to 9)
Layer Naming:
- Input Layer
- First Hidden Layer
- Second Hidden Layer/Output Layer

Step: Calculate final output starting from the input layer
Variables: A0 (input), Z1 (linear combination with weights and biases), A1 (activated with ReLU)
Weights (W) and Biases (B): Applied iteratively
Activation Function: Non-linear function (ReLU) applied to results
Output Activation: Softmax (probabilities for class prediction)

Error Calculation: Evaluate error in predictions (difference between predicted and actual)
Weight & Bias Adjustment: Using the computed error, adjust weights and biases through derived relationships
Learning Rate (α): Determines the update magnitude (hyperparameter)
Update Equations:
- W1 = W1 - α * dW1
- B1 = B1 - α * dB1
- Same for W2 and B2

Libraries: NumPy, Pandas
Data Loading: Using Pandas to read CSV and convert to NumPy arrays
Data Splitting: Training data (actual learning) and Dev data (cross-validation/testing)
Shuffling and Transposing: Ensuring each column represents an example

Parameter Initialization: Function to initialize W1, W2, B1, B2 with random values
Forward Propagation: Function to carry signals from input to output using weights, biases, and activation functions
Relu and Softmax Functions: Defined for activation and output layers
Back Propagation: Function to compute gradients and update weights and biases
Update Parameters: Adjust weights and biases based on backprop results
Gradient Descent: Main loop that iterates, calling forward and backpropagation, updating weights/biases

Environment: Used Kaggle notebooks for coding, leveraging easy data handling
Implementation of Functions: Step-by-step coding of forward prop, back prop, and parameter updates
Iterative Training: Running multiple iterations to minimize error and adjust weights
Debugging: Adjusting code to handle initialization issues and correct function integration

Training Accuracy: ~84% accuracy on training data within 30 minutes
Validation Accuracy: ~85.5% accuracy on cross-validation data (effectively testing data)
Error Analysis: Examples where the model misclassifies (e.g., tricky digits like rotated numbers)

Satisfaction: High satisfaction in creating, debugging, and seeing model accuracy improve
Next Steps for Enhancement: Trying more layers, regularization, different optimization methods (e.g., momentum, RMSProp, Adam)
Resources: Link to blog post and notebook for further exploration
Final Thoughts: Building from scratch gives deep understanding and satisfaction compared to using higher-level libraries

Thank You: Thanks for watching and engaging!