VGG 16 Architecture Overview

Overview

This lecture covers the architecture and key characteristics of the VGG 16 convolutional neural network, focusing on its structure, parameter count, and feature extraction process.

VGG 16 Architecture Basics

VGG 16 is a deep convolutional neural network with 16 layers, inspired by AlexNet but deeper.
It contains approximately 125 million parameters, making it a large model.
The network uses only 3x3 convolutional layers with a stride of 1 and 'same' padding.
All convolutions maintain the input size due to the use of padding.

Layer Structure and Parameters

VGG 16 repeatedly applies 3x3 convolutions, followed by 2x2 max pooling layers to reduce spatial dimensions.
The number of channels (feature maps) increases as the spatial dimensions decrease.
For example, the input is a 224x224 RGB image; first convolutions increase channels from 3 to 64, followed by max pooling.
Later layers can have up to 512 channels, leading to a high number of parameters.
Fully connected layers at the end are very large (e.g., 4096x4096 weights per layer).
Fully connected layers can also be represented as convolutions.

Feature Extraction and Design Principle

The network reduces the height and width of the feature maps while increasing the number of channels.
Each channel is expected to learn different types of features via separate kernels.
The architecture's design squeezes information into more channels with smaller spatial resolution.

Key Terms & Definitions

Convolutional Layer — layer applying kernels to extract features from input data.
Max Pooling — operation reducing the size of feature maps by taking the maximum value over a region.
Channel — a set of feature maps; more channels capture more types of features.
'Same' Padding — padding added to input so output size after convolution matches input size.
Fully Connected Layer — layer where every input is connected to every output, typically used at the end of CNNs.

Action Items / Next Steps

Review the VGG 16 paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014).
Watch the next lecture/video for code implementation of VGG 16.