Lecture on Mathematics Behind Neural Networks

Introduction

Presenter: Adam Dalla
Focus: Mathematics and calculus in back propagation for neural networks
Note: This is not an introductory lecture on neural networks; it is an advanced mathematical insight.

Prerequisites

Basic Linear Algebra
- Transposing, multiplying, adding matrices
- Dot products and vectors
Multivariable Calculus
- Should be familiar with differential calculus
- Knowledge about Jacobians and gradients
Basic Machine Learning
- Understand cost functions, gradient descent

Agenda

Big Picture of Neural Networks and Back Propagation
- Understanding goals and objectives in back propagation
Review of Multivariable Calculus
- Jacobians, gradients, derivatives with multiple variables
Single Neuron as a Function
- Inputs as scalars and vectors
- Outputs as scalars
Jacobians in Neural Networks
- Analysis using a simple network
Gradient Descent and Optimization
- Stochastic Gradient Descent
In-depth Back Propagation
- Scalars, vectors, derivations, matrices, and Jacobians

Notation and Definitions

M: Size of training set
N: Number of input variables
L (Capital): Number of layers in the network
l (Lowercase): Specific layer
W^l: Weights going into layer l
b^l: Bias of layer l
x_i: Single input variable

Key Concepts

Neural Networks as Functions

Function composition made of smaller functions
Parameters: Weights and biases
Inputs: Vector of variables
Outputs: Scalars or classification probabilities

Calculus in Neural Networks

Cost Function: Minimization using calculus
Derivative of Cost: With respect to weights and biases
Chain Rule: Complex derivative calculations

Matrix Calculus Basics

Gradients: Partial derivatives of vector to scalar functions
Jacobians: Partial derivatives of vector to vector functions

Back Propagation

Built on derivatives and chain rule
Jacobians in Back Propagation
- Key for calculating gradient descent
Error and Cost Minimization
- Using derivatives to correct weights and biases

Derivatives in Neural Networks

Scalar Expansion: Expanding scalars into vectors
Sum Derivatives: Impact on output cost

Practical Applications

Gradient Descent

Learning Rate: Critical for step size in gradient descent
Stochastic Gradient Descent (SGD): Efficiency with larger datasets
- Mini-batch approach instead of full batch

Back Propagation in Depth

Error of a Node: Impact of changing activation before the function
Four Fundamental Equations for Backpropagation
- Errors in the final layer
- Propagation of errors backwards
- Calculation of weights and bias derivatives

Implementation

Forward Pass: Calculate activations, store Z and A values
Backward Pass: Calculate errors, use to adjust weights and biases

Conclusion

Back propagation enables adjustment of weights/biases to minimize cost
Gradient Descent: Method to optimize neural networks
Application: Neural network training with libraries like Pytorch, TensorFlow

Future Directions

Practical implementation with real data (e.g., MNIST)
Exploring more complex neural network types (e.g., convolutional networks)

This lecture explored the mathematical foundations necessary for understanding the back propagation algorithm used in training neural networks, providing valuable insights into the role of calculus in machine learning.

Mathematics of Neural Network Back Propagation