Manual Implementation of Backpropagation in a Neural Network

Introduction

Continuation of the topic of multilayer perceptrons.
Upcoming introduction to Recurrent Neural Networks (RNN).
The goal is to understand how loss.backward() in Pytorch works by removing its use and manually writing the backward pass.
Importance of understanding backpropagation as a 'leaky abstraction'.

Visual description of forward pass and calculation of backward pass in stages.
Use of gradient retention in intermediate values to compare with Pytorch.
Numerical issues and adjustments in variable initialization.
Example of gradient calculation implementation for a specific functionality (cross-entropy, batch normalization).

Gradient of log_probs:
d_log_probs = torch.zeros_like(log_probs) d_log_probs[range(N), Yb] = -1.0 / N
Verification with assert torch.allclose(d_log_probs, log_probs.grad).
Gradient of probs:
d_probs = (1.0 / probs) * d_log_probs
Verification with assert torch.allclose(d_probs, probs.grad).
Gradient of multiplied flow:
d_counts = d_probs * count_summ['data'] # Check shapes and broadcasting d_counts = be_zero_sum(d_counts).sum(dim=1, keepdim=True)
Backpropagation in linear layers (multiplying variables).
Manual gradient derivation using algebra: paper exercise to verify.

Exercise to manually derive the loss function (cross-entropy) and batch normalization on paper to optimize the calculation in a single step instead of splitting it into multiple intermediate operations.
Example of derivative to show the reduction of calculation.

Encoding the direct gradient of the cross-entropy.
Importance of keeping all shapes consistent for gradient calculation.
Exercise to integrate all calculated gradients step by step and eliminate loss.backward().

Pedagogical benefit of manually implementing gradients for a better understanding of the neural network.
Although not practical daily, this practice allows a deeper understanding of the learning process and parameter adjustments in a neural network.
Next step: Introduction to Recurrent Neural Networks (RNNs) and their variants.