Overview
This lecture covers the architecture and key characteristics of the VGG 16 convolutional neural network, focusing on its structure, parameter count, and feature extraction process.
VGG 16 Architecture Basics
- VGG 16 is a deep convolutional neural network with 16 layers, inspired by AlexNet but deeper.
- It contains approximately 125 million parameters, making it a large model.
- The network uses only 3x3 convolutional layers with a stride of 1 and 'same' padding.
- All convolutions maintain the input size due to the use of padding.
Layer Structure and Parameters
- VGG 16 repeatedly applies 3x3 convolutions, followed by 2x2 max pooling layers to reduce spatial dimensions.
- The number of channels (feature maps) increases as the spatial dimensions decrease.
- For example, the input is a 224x224 RGB image; first convolutions increase channels from 3 to 64, followed by max pooling.
- Later layers can have up to 512 channels, leading to a high number of parameters.
- Fully connected layers at the end are very large (e.g., 4096x4096 weights per layer).
- Fully connected layers can also be represented as convolutions.
Feature Extraction and Design Principle
- The network reduces the height and width of the feature maps while increasing the number of channels.
- Each channel is expected to learn different types of features via separate kernels.
- The architecture's design squeezes information into more channels with smaller spatial resolution.
Key Terms & Definitions
- Convolutional Layer — layer applying kernels to extract features from input data.
- Max Pooling — operation reducing the size of feature maps by taking the maximum value over a region.
- Channel — a set of feature maps; more channels capture more types of features.
- 'Same' Padding — padding added to input so output size after convolution matches input size.
- Fully Connected Layer — layer where every input is connected to every output, typically used at the end of CNNs.
Action Items / Next Steps
- Review the VGG 16 paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014).
- Watch the next lecture/video for code implementation of VGG 16.