Module 4 - Article 2 - Medium - Convolutional Neural Networks For Artificial Vision

Overview

The lecture covers the evolution from traditional machine learning to deep learning, focusing on convolutional neural networks (CNNs) and their application in artificial vision tasks such as image classification and object detection.

Traditional Machine Learning

Machine learning refers to algorithms that analyze data, learn from it, and make informed decisions.
Classic ML methods include Naive Bayes, Support Vector Machines, K-Means, and dimensionality reduction like PCA and t-SNE.
Traditional vision systems use a two-step process: hand-crafted feature extraction and rule-based decision-making, which can be rigid and error-prone.

Deep Learning & Neural Networks

Deep learning uses multi-layer (deep) artificial neural networks (ANNs) to learn complex representations automatically.
ANNs have input, hidden, and output layers, learning to map inputs to outputs via weighted connections and nonlinear activation functions.
Key ANN challenges include exploding/vanishing gradients, high parameter counts for image data, and inability to preserve spatial information.

Convolutional Neural Networks (CNNs)

CNNs are specialized deep learning models for grid-like data (e.g., images), inspired by animal visual cortex structure.
CNN architecure contains convolution layers (with weight-sharing kernels), pooling layers (reduce dimensionality), and fully connected layers (final classification).
Convolution layers detect patterns/features using kernels; pooling layers summarize and downsample; fully connected layers map features to output classes.

CNN Architectures & Applications

Common architectures: LeNet (early digit recognition), AlexNet (ImageNet champion, deeper layers), ZFNet (improved kernel/filter design).
CNNs outperform traditional ML in vision, enabling translation invariance and robust pattern recognition.
Training involves forward propagation, loss calculation, and backpropagation with gradient descent to optimize parameters.

Object Detection Algorithms

Mask R-CNN and YOLO are advanced CNN-based object detectors.
Mask R-CNN provides instance segmentation and pixel-accurate masks but is slower and less effective for small objects.
YOLO is a single-stage, real-time object detector that divides an image into a grid and is faster and generally more robust.

Key Terms & Definitions

Machine Learning — Algorithms that learn patterns from data for prediction or classification.
Deep Learning — Subset of ML using deep neural networks to learn features automatically.
ANN (Artificial Neural Network) — Network of neurons/layers modeling complex data relationships.
CNN (Convolutional Neural Network) — Deep network for grid data, using convolution, pooling, fully connected layers.
Convolution Layer — Applies kernels over input data to extract features.
Pooling Layer — Reduces dimensionality by summarizing feature maps.
Fully Connected Layer — Each neuron connects to all outputs of the previous layer for final prediction.
Backpropagation — Optimization algorithm adjusting weights based on output errors.
Mask R-CNN — Two-stage detection and segmentation model.
YOLO — Single-stage, real-time object detection model.

Action Items / Next Steps

Review CNN architectures (LeNet, AlexNet, ZFNet) and their differences.
Study the formulas for convolution output size and understand kernel/stride/padding roles.
Compare Mask R-CNN and YOLO for object detection use cases.