📊

Mathematics of Neural Network Back Propagation

Jun 3, 2025

Lecture on Mathematics Behind Neural Networks

Introduction

  • Presenter: Adam Dalla
  • Focus: Mathematics and calculus in back propagation for neural networks
  • Note: This is not an introductory lecture on neural networks; it is an advanced mathematical insight.

Prerequisites

  • Basic Linear Algebra
    • Transposing, multiplying, adding matrices
    • Dot products and vectors
  • Multivariable Calculus
    • Should be familiar with differential calculus
    • Knowledge about Jacobians and gradients
  • Basic Machine Learning
    • Understand cost functions, gradient descent

Agenda

  1. Big Picture of Neural Networks and Back Propagation
    • Understanding goals and objectives in back propagation
  2. Review of Multivariable Calculus
    • Jacobians, gradients, derivatives with multiple variables
  3. Single Neuron as a Function
    • Inputs as scalars and vectors
    • Outputs as scalars
  4. Jacobians in Neural Networks
    • Analysis using a simple network
  5. Gradient Descent and Optimization
    • Stochastic Gradient Descent
  6. In-depth Back Propagation
    • Scalars, vectors, derivations, matrices, and Jacobians

Notation and Definitions

  • M: Size of training set
  • N: Number of input variables
  • L (Capital): Number of layers in the network
  • l (Lowercase): Specific layer
  • W^l: Weights going into layer l
  • b^l: Bias of layer l
  • x_i: Single input variable

Key Concepts

Neural Networks as Functions

  • Function composition made of smaller functions
  • Parameters: Weights and biases
  • Inputs: Vector of variables
  • Outputs: Scalars or classification probabilities

Calculus in Neural Networks

  • Cost Function: Minimization using calculus
  • Derivative of Cost: With respect to weights and biases
  • Chain Rule: Complex derivative calculations

Matrix Calculus Basics

  • Gradients: Partial derivatives of vector to scalar functions
  • Jacobians: Partial derivatives of vector to vector functions

Back Propagation

  • Built on derivatives and chain rule
  • Jacobians in Back Propagation
    • Key for calculating gradient descent
  • Error and Cost Minimization
    • Using derivatives to correct weights and biases

Derivatives in Neural Networks

  • Scalar Expansion: Expanding scalars into vectors
  • Sum Derivatives: Impact on output cost

Practical Applications

Gradient Descent

  • Learning Rate: Critical for step size in gradient descent
  • Stochastic Gradient Descent (SGD): Efficiency with larger datasets
    • Mini-batch approach instead of full batch

Back Propagation in Depth

  • Error of a Node: Impact of changing activation before the function
  • Four Fundamental Equations for Backpropagation
    • Errors in the final layer
    • Propagation of errors backwards
    • Calculation of weights and bias derivatives

Implementation

  • Forward Pass: Calculate activations, store Z and A values
  • Backward Pass: Calculate errors, use to adjust weights and biases

Conclusion

  • Back propagation enables adjustment of weights/biases to minimize cost
  • Gradient Descent: Method to optimize neural networks
  • Application: Neural network training with libraries like Pytorch, TensorFlow

Future Directions

  • Practical implementation with real data (e.g., MNIST)
  • Exploring more complex neural network types (e.g., convolutional networks)

This lecture explored the mathematical foundations necessary for understanding the back propagation algorithm used in training neural networks, providing valuable insights into the role of calculus in machine learning.