Introduction to Convolutional Neural Networks with TensorFlow

Speaker

Neil Leiser: Data Scientist at Iwaka (Fintech startup) and host of AI Stories Podcast.

Background: Born in Belgium, moved to London seven years ago, studied civil engineering at Imperial College.
Shift to Data Science: Post-2019, completed master's in data science in London. Research thesis on predicting solar panel output using satellite images led to deep dive into CNNs.
Current Role: Builds machine learning algorithms for credit risk at Iwaka.
Podcast: Hosts AI Stories Podcast interviewing data professionals and tech leaders.

Brief introduction to AI and neural networks (5 minutes) to align understanding.
Theory behind convolutional neural networks (CNNs) to show the intuition of their operation.
Practical session using Google Colab to build a CNN algorithm with Python and TensorFlow.

AlexNet Paper (2012): Pivotal in popularizing deep learning, set new record in the ImageNet Challenge (image classification); error rate reduced significantly from 26% to 16%.
Historical Context: Neural networks since the 1980s; CNNs since the 1990s.
Key Factors for Success: Availability of computing power and large datasets by 2012.
Basics of Neural Networks:
- Composed of neurons (nodes) connected by weights (edges).
- Three key parts: Input layer, hidden layers, output layer.
- Input layer: Input data; Hidden layers: Intermediate processing; Output layer: Final prediction.
- Example: Toy neural network with 2 hidden layers of 4 neurons each, resulting in a total of 22 weights to train.

Image Representation: Images are matrices of pixel values.
Why CNNs?:
- Spatial Dependencies: CNNs focus on groups of pixels (e.g., edges, eyes) instead of individual pixels.
- Efficiency: More efficient in processing images compared to traditional neural networks.
CNN Architecture:
- Input: Image matrix.
- Features Extraction (Blue Part):
  - Convolutional Layers: Apply kernels to extract features.
  - Pooling Layers: Reduce dimensionality and adjust for translation and rotation variabilities.
- Classification (Neural Network Portion): Final features used for classification (fully connected neural network).
Layers and Filters:
- Initial layers capture fine details like edges, later layers capture complex patterns.

Kernel: Small matrix (filter) applied to the input image to extract features.
Example: 3x3 kernel applied to a 5x5 image.
Components:
- Stride: Step size for moving the kernel across the image.
- Padding: Adding extra pixels to the image boundary to retain edge information.
Pooling Operation:
- Max Pooling: selects maximum value from the region covered by the filter.
- Average Pooling: Takes the average value from the region covered by the filter.
Architecture Example:
- Layer 1: Convolution (32 filters, 3x3 kernel) -> Max pooling (2x2 pool size)
- Layer 2: Convolution (64 filters, 3x3 kernel) -> Max pooling (2x2 pool size)
- Fully connected neural network for final classification.

Libraries: Installing TensorFlow and required Python libraries.
Loading Data: Fashion MNIST dataset (60,000 training images, 10,000 test images).
Data Processing:
- Normalizing pixel values to range [0, 1].
- Expanding dimensions of the images for CNN compatibility.
- Splitting training data into training and validation sets.

Model Architecture:
- Input layer specifying the shape (28x28x1).
- Three convolutional layers with max-pooling layers:
  - 32 filters, 3x3 kernel, stride 1 (ReLU activation).
  - 64 filters, 3x3 kernel, stride 1 (ReLU activation).
  - Fully connected layers (256 neurons, ReLU) -> Output layer with softmax activation (10 classes).
Compiling Model:
- Optimizer: Adam optimizer.
- Loss Function: Sparse categorical cross-entropy.
- Metrics: Sparse categorical accuracy.
Training Model: Using training data and validating with validation set.
Evaluation: Model performance on test set (~89% accuracy).
Visualization: Inspecting correct and incorrect predictions using confusion matrix and visual examples.

Various questions on implementation, optimization techniques, pooling, kernel sizes, padding, and hyperparameter tuning were answered.
Specific insights into using CNNs for non-image data and combining image data with additional features were discussed.