Neural Networks from Scratch: Regularization with L1 and L2
Introduction
- Topic: Regularization, specifically L1 and L2 Regularization in Neural Networks
- Agenda:
- Importance of Regularization
- Benefits of Regularization
- Coding Blocks for Regularization
- Forward and Backward Paths with Regularization
- Code Implementation and Demonstration
Generalization in Neural Networks
- Generalization: Ability of a neural network to perform well on new, unseen data
- Overfitting: Model memorizes training data and performs poorly on test data
- Example:
- Model 1 (overfit): Complex, fits training data perfectly, poor test performance
- Model 2 (generalized): Simpler, some training errors, better test performance
Motivation for Regularization
- Goal: Improve generalization and prevent overfitting
- Overfitting: Results in complex models with large weights and biases
- Desired: Simpler models with smaller weights and biases
- Regularization: Adds penalty to large weights and biases
Types of Regularization
L1 Regularization (Lasso)
- Loss Function Addition:
Lambda * sum(|weights|)
- Penalty: High on small weights; can drive some weights to zero, promoting sparsity
L2 Regularization (Ridge)
- Loss Function Addition:
Lambda * sum(weights^2)
- More Preferred: Smaller penalty on small weights, promotes small weights but not zero
- Combination: Often L1 and L2 are used together for better performance
Implementing Regularization
Augmentation in Layer Class
- Initialization: Need to add lambda parameters for weights and biases
- Four parameters: weight_lambda_L1, weight_lambda_L2, bias_lambda_L1, bias_lambda_L2
- Forward Pass: Add regularization term to data loss
- Data Loss: Prediction error
- Regularization Loss: Depends on chosen regularizer (L1 or L2)
- Backward Pass:
- L2: Updates derivative terms by adding
2 * lambda * weights
- L1: Updates derivative terms:
Lambda * 1
if weight > 0
Lambda * -1
if weight < 0
Coding Example
- Layer Instance Creation: Initialize layers with needed parameters
- Inputs, neurons, and regularization lambdas
- Forward Pass: Compute outputs, add data and regularization loss
- Backward Pass: Compute gradients, update using regularization terms
- Training: Include regularization in loss, track validation accuracy
Detailed Code Walkthrough
- Data Definition: Example with spiral dataset
- Layer Initialization: Define inputs, neurons, and regularization params
- Forward/Backward Pass: Detailed implementation steps for calculation
- Validation: Evaluate performance on test data
Results & Observations
- Test Accuracy: Improved with regularization
- Without Regularization: Higher overfitting, lower test accuracy
- Effect of Training Data Size: Increasing training data improves model performance
- More data results in smoother model predictions and higher validation accuracy
Conclusion
- Summary: Regularization (L1 & L2) crucial for improving generalization and avoiding overfitting
- Next Steps: Experiment with different regularization parameters, data sizes
- Further Learning: Complete remaining series for deeper understanding of neural network fundamentals
Call to Action: Share this notebook and experiment with settings for deeper comprehension.