Jul 14, 2024
loss.backward() in Pytorch works by removing its use and manually writing the backward pass.Gradient of log_probs:
d_log_probs = torch.zeros_like(log_probs)
d_log_probs[range(N), Yb] = -1.0 / N
Verification with assert torch.allclose(d_log_probs, log_probs.grad).
Gradient of probs:
d_probs = (1.0 / probs) * d_log_probs
Verification with assert torch.allclose(d_probs, probs.grad).
Gradient of multiplied flow:
d_counts = d_probs * count_summ['data'] # Check shapes and broadcasting
d_counts = be_zero_sum(d_counts).sum(dim=1, keepdim=True)
Backpropagation in linear layers (multiplying variables).
Manual gradient derivation using algebra: paper exercise to verify.
cross-entropy.loss.backward().