๐Ÿง 

Manual Backpropagation Implementation in Neural Networks

Jul 14, 2024

Manual Implementation of Backpropagation in a Neural Network

Introduction

  • Continuation of the topic of multilayer perceptrons.
  • Upcoming introduction to Recurrent Neural Networks (RNN).
  • The goal is to understand how loss.backward() in Pytorch works by removing its use and manually writing the backward pass.
  • Importance of understanding backpropagation as a 'leaky abstraction'.

Common Problems in Backpropagation

  • Gradient saturation in activation functions.
  • Dead neurons.
  • Exploding or vanishing gradients in RNN.
  • Subtle errors in gradient implementation.

Practical Exercise

  • It is proposed to manually implement the backward pass at the tensor level.

Historical Backpropagation

  • 10 years ago, backward pass in deep learning was manually written.
  • Example of a work in 2006 by Hinton and Salakhutdinov.
  • Implementations in MATLAB rather than Python at that time.
  • Perspective that manually writing gradients improved understanding.

Manual Implementation of the Backward Pass

  • Visual description of forward pass and calculation of backward pass in stages.
  • Use of gradient retention in intermediate values to compare with Pytorch.
  • Numerical issues and adjustments in variable initialization.
  • Example of gradient calculation implementation for a specific functionality (cross-entropy, batch normalization).

Technical Details

  1. Gradient of log_probs:

    d_log_probs = torch.zeros_like(log_probs) d_log_probs[range(N), Yb] = -1.0 / N

    Verification with assert torch.allclose(d_log_probs, log_probs.grad).

  2. Gradient of probs:

    d_probs = (1.0 / probs) * d_log_probs

    Verification with assert torch.allclose(d_probs, probs.grad).

  3. Gradient of multiplied flow:

    d_counts = d_probs * count_summ['data'] # Check shapes and broadcasting d_counts = be_zero_sum(d_counts).sum(dim=1, keepdim=True)
  4. Backpropagation in linear layers (multiplying variables).

  5. Manual gradient derivation using algebra: paper exercise to verify.

Simplification Using Algebra

  • Exercise to manually derive the loss function (cross-entropy) and batch normalization on paper to optimize the calculation in a single step instead of splitting it into multiple intermediate operations.
  • Example of derivative to show the reduction of calculation.

Final Implementation

  • Encoding the direct gradient of the cross-entropy.
  • Importance of keeping all shapes consistent for gradient calculation.
  • Exercise to integrate all calculated gradients step by step and eliminate loss.backward().

Conclusion

  • Pedagogical benefit of manually implementing gradients for a better understanding of the neural network.
  • Although not practical daily, this practice allows a deeper understanding of the learning process and parameter adjustments in a neural network.
  • Next step: Introduction to Recurrent Neural Networks (RNNs) and their variants.