Deep Learning: Introduction and Key Concepts

Introduction

Deep Learning Importance
- Revolutionizing many fields, achieving milestones previously thought impossible.
- Examples include DeepMind's AlphaGo, cancer diagnosis, web translation, autonomous vehicles.

Core Topics:
- Definition and distinction between artificial intelligence, machine learning, and deep learning.
- Introduction to neural networks and their importance in deep learning.
- Training deep learning models and the types of learning: supervised, unsupervised, reinforcement learning.
- Key concepts: loss functions, optimizers, gradient descent, neural network architectures.

Definition: Subset of Machine Learning (ML), part of Artificial Intelligence (AI).
Machine Learning: Algorithms teach computers to recognize patterns in data similar to how humans do.
Challenges: Teaching machines to distinguish between objects like cats and dogs.

Architecture: Layers of neurons, including input layer, hidden layers, output layer.
Learning Process: Forward propagation and back propagation.
- Forward Propagation: Input processed through layers to generate output.
- Back Propagation: Adjusts weights and biases to minimize error using loss function.
Training: Training involves iteratively adjusting weights and biases to reduce prediction error.
Example: Predicting vehicle types using neural networks adjusting from input weights, goods carried, to final classification.

Activation Functions
- Introduces non-linearity in network, allows modeling of complex functions.
- Types: Step Function, Linear Function, Sigmoid, TanH, ReLU, Leaky ReLU.
- Sigmoid: Outputs between 0-1 but can cause vanishing gradient problem.
- TanH: Similar to sigmoid but ranges from -1 to 1.
- ReLU: Outputs value or 0, efficient but can lead to 'dying ReLU' problem.
Loss Functions: Quantifies difference between predicted and actual output (e.g., squared error loss, cross-entropy).
Optimizers
- Gradient Descent: Minimizes loss function iteratively by adjusting weights.
- Variants: Stochastic Gradient Descent, AdaGrad, RMSProp, Adam.
- Learning Rate: Controls step size in gradient descent.
Model Parameters vs Hyperparameters
- Parameters: Internal values (weights, biases) estimated from data.
- Hyperparameters: External configurations set manually (learning rate, epoch number).
Epochs, Batch Size, Iterations
- Epoch: One pass through the entire dataset.
- Batch: Subset of data processed in one step.
- Iterations: Number of batches per epoch.

Supervised Learning
- Training with labeled data to map input to output (e.g., classification and regression).
Unsupervised Learning
- Finding patterns in unlabeled data (e.g., clustering, association).
Reinforcement Learning
- Learning through rewards and punishments to maximize overall reward.

Overfitting: Model performs well on training data but poorly on new data.
Regularization Techniques
- Dropout: Randomly dropping neurons during training to prevent co-dependency.
- Data Augmentation: Generating new data from existing data to enhance training set.
- Early Stopping: Halting training when validation error begins to rise.

Feed-forward Networks: Simple form, no cycles, each neuron connected to next layer.
Recurrent Neural Networks (RNNs): For sequence data, includes feedback loops to remember past information (e.g., text prediction).
- Challenges: Short-term memory due to vanishing gradient.
- Variants: LSTM (Long Short-Term Memory), Gated RNNs.
Convolutional Neural Networks (CNNs): For image data, includes convolutional and pooling layers to reduce dimensionality and extract features.
- Applications: Image recognition, segmentation, video analysis.

Data Collection: Gather sufficient and high-quality data relevant to problem.
Data Preprocessing:
- Splitting data into training, validation, and test sets.
- Handling missing data and imbalanced data.
- Feature scaling and normalization.
Model Training: With chosen architecture, adjusting through back propagation and evaluating with validation set.
Model Evaluation: Testing against unseen data to check for accuracy and generalization.
Optimization: Tuning hyperparameters, regularization techniques to improve performance.