Neural Networks from Scratch: Regularization with L1 and L2

Jul 10, 2024

Neural Networks from Scratch: Regularization with L1 and L2

Introduction

  • Topic: Regularization, specifically L1 and L2 Regularization in Neural Networks
  • Agenda:
    • Importance of Regularization
    • Benefits of Regularization
    • Coding Blocks for Regularization
    • Forward and Backward Paths with Regularization
    • Code Implementation and Demonstration

Generalization in Neural Networks

  • Generalization: Ability of a neural network to perform well on new, unseen data
  • Overfitting: Model memorizes training data and performs poorly on test data
    • Example:
      • Model 1 (overfit): Complex, fits training data perfectly, poor test performance
      • Model 2 (generalized): Simpler, some training errors, better test performance

Motivation for Regularization

  • Goal: Improve generalization and prevent overfitting
  • Overfitting: Results in complex models with large weights and biases
  • Desired: Simpler models with smaller weights and biases
    • Regularization: Adds penalty to large weights and biases

Types of Regularization

L1 Regularization (Lasso)

  • Loss Function Addition: Lambda * sum(|weights|)
  • Penalty: High on small weights; can drive some weights to zero, promoting sparsity

L2 Regularization (Ridge)

  • Loss Function Addition: Lambda * sum(weights^2)
  • More Preferred: Smaller penalty on small weights, promotes small weights but not zero
  • Combination: Often L1 and L2 are used together for better performance

Implementing Regularization

Augmentation in Layer Class

  • Initialization: Need to add lambda parameters for weights and biases
    • Four parameters: weight_lambda_L1, weight_lambda_L2, bias_lambda_L1, bias_lambda_L2
  • Forward Pass: Add regularization term to data loss
    • Data Loss: Prediction error
    • Regularization Loss: Depends on chosen regularizer (L1 or L2)
  • Backward Pass:
    • L2: Updates derivative terms by adding 2 * lambda * weights
    • L1: Updates derivative terms:
      • Lambda * 1 if weight > 0
      • Lambda * -1 if weight < 0

Coding Example

  • Layer Instance Creation: Initialize layers with needed parameters
    • Inputs, neurons, and regularization lambdas
  • Forward Pass: Compute outputs, add data and regularization loss
  • Backward Pass: Compute gradients, update using regularization terms
  • Training: Include regularization in loss, track validation accuracy

Detailed Code Walkthrough

  • Data Definition: Example with spiral dataset
  • Layer Initialization: Define inputs, neurons, and regularization params
  • Forward/Backward Pass: Detailed implementation steps for calculation
  • Validation: Evaluate performance on test data

Results & Observations

  • Test Accuracy: Improved with regularization
  • Without Regularization: Higher overfitting, lower test accuracy
  • Effect of Training Data Size: Increasing training data improves model performance
    • More data results in smoother model predictions and higher validation accuracy

Conclusion

  • Summary: Regularization (L1 & L2) crucial for improving generalization and avoiding overfitting
  • Next Steps: Experiment with different regularization parameters, data sizes
  • Further Learning: Complete remaining series for deeper understanding of neural network fundamentals

Call to Action: Share this notebook and experiment with settings for deeper comprehension.