Machine Learning Algorithms Overview

Aug 21, 2024

AI and ML for Geodata Analysis: Session 2 Notes

Introduction

  • Today's topic: Machine Learning Algorithms
  • Presenter: Dr. Punam Sayari
  • Reminder for participants to watch the session via YouTube and complete quizzes for certification.

Importance of Data

  • Quote from Professor GTH James: "For the 21st century, data is the sword; handling it properly makes one a samurai."
  • Need for advanced algorithms to handle large data sets.

Definitions

Machine Learning (ML)

  • Subset of Artificial Intelligence (AI)
  • Automates analytical models using data.
  • ML learns relationships between input/output data sets and predicts outcomes.

Deep Learning

  • Subset of ML using deep neural networks.
  • Utilizes complex layers for feature extraction and prediction.

Key Differences Between ML and Deep Learning

  1. Problem-Solving Approach
    • ML: Features are manually extracted.
    • Deep Learning: Minimal human intervention; features are learned automatically.
  2. Training Methods
    • ML: Supervised, unsupervised, reinforcement learning, etc.
    • Deep Learning: Uses specialized architectures (autoencoders, CNNs, RNNs).
  3. Algorithm Complexity
    • ML: Varied complexity.
    • Deep Learning: Complex architectures of interconnected neurons.
  4. Data Requirements
    • ML: Can work on smaller datasets.
    • Deep Learning: Requires large datasets and significant computational power.

Types of Machine Learning Algorithms

1. Supervised Learning

  • Uses labeled data for training.
  • Examples: Classification and regression tasks.
  • Classification: Assigning labels to input data (e.g., land cover classification).
  • Regression: Predicting continuous output values.

2. Unsupervised Learning

  • No labeled data; finds patterns on its own.
  • Clustering: Groups similar data points together.
  • Examples: K-means clustering, hierarchical clustering.

3. Semi-supervised Learning

  • Combines a small amount of labeled data with a large amount of unlabeled data.

4. Reinforcement Learning

  • Software learns to make decisions through trial and error.
  • Actions rewarded or penalized to optimize outcomes.

Common Machine Learning Algorithms

Supervised Learning Algorithms

  1. Paralle Classification Algorithm: Fast and simple classification method based on means and standard deviations.
  2. Minimum Distance to Means Classification: Assigns pixels based on the shortest distance to class means.
  3. Mahalanobis Decision Rule: Considers covariance, effective for overlapping classes.
  4. Maximum Likelihood Classification: Based on probability distributions, most accurate but computationally intensive.

Decision Tree Classifier

  • Makes binary decisions to classify data.
  • Easy to understand and interpret.

Random Forest Classifier

  • Ensemble method using multiple decision trees to improve accuracy and reduce overfitting.

Support Vector Machines (SVM)

  • Finds optimal hyperplanes for classification tasks; effective in high-dimensional spaces.

Artificial Immune Networks & Logistic Regression

  • Used for complex computational problems and basic classification tasks.

Unsupervised Learning Algorithms

  • K-means Clustering: Assigns data points to clusters based on distance from centroids.
  • Advantages: Automatically groups data based on similarities.
  • Disadvantages: Less accuracy due to unlabeled data.

Conclusion

  • Importance of selecting the right algorithm based on the problem, data, and desired outcomes.
  • Session will continue with a Q&A session following a short break.