Overview
This lecture introduces the structure of neural networks using handwritten digit recognition as an example, focusing on how layers, neurons, weights, and activation functions operate.
Neural Network Structure and Purpose
- Neural networks can recognize handwritten digits from low-resolution images (28x28 pixels) with surprising accuracy.
- The input layer consists of 784 neurons, each representing a pixel's grayscale value from 0 (black) to 1 (white).
- The output layer has 10 neurons, each representing a digit (0–9); the highest activation indicates the network's digit prediction.
- Hidden layers (e.g., two layers of 16 neurons) process and abstract features between input and output, with their structure open to experimentation.
How Activations Are Computed
- Each neuron’s activation depends on a weighted sum of inputs from the previous layer; weights determine the importance of each input.
- A bias term is added to the weighted sum to shift the activation threshold.
- The weighted sum plus bias passes through an activation function, commonly the sigmoid function, to constrain outputs to between 0 and 1.
Interpreting Layers and Feature Detection
- Lower layers may detect simple patterns like edges, while higher layers detect abstract features or digit subcomponents.
- The goal is for hidden layers to progressively recognize edges, patterns, and then digit identities.
Mathematical Representation
- Neuron activations from a layer are organized as a vector, weights as a matrix, and biases as a vector.
- The transition from one layer to the next is a matrix-vector multiplication plus the bias vector, followed by applying the activation function.
Activation Functions: Sigmoid vs. ReLU
- The sigmoid function (“squishification”) maps the activation to (0,1) but is rarely used in modern deep networks due to training difficulties.
- ReLU (Rectified Linear Unit) is now preferred; it outputs zero for negative inputs and identity for positive inputs, making training more effective.
Key Terms & Definitions
- Neuron — A unit that holds a value (activation), typically between 0 and 1.
- Activation — The output value of a neuron, representing how "active" it is.
- Weight — A parameter that scales the influence of an input connection.
- Bias — An extra parameter added before applying the activation function to shift the output.
- Sigmoid Function — A mathematical function that maps any real value into (0,1).
- ReLU (Rectified Linear Unit) — An activation function defined as max(0, x).
- Layer — A group of neurons at the same stage in the network.
Action Items / Next Steps
- Watch the following video to learn about the learning and training process in neural networks.
- Optional: Review Chapter 3 of the instructor's linear algebra series for a deeper understanding of matrices and matrix multiplication.