Lecture Notes on AI and ML for Geodata Analysis

Introduction

Quote by Professor GTH James: "For the 21st century, data is the sword, and whoever can handle data properly will be called a samurai."
The era of big data demands advanced algorithms.

Subset of Artificial Intelligence (AI).
Automates tasks by learning from data without explicit programming for each task.
Algorithms learn relationships between input and output datasets.

Further subset of machine learning using deep neural networks.
Utilizes multiple layers for learning from datasets.
Model depth indicates the number of layers, which could lead to better models, but not always.

Problem-Solving Approach:
- Machine Learning: Manual feature extraction.
- Deep Learning: Minimal human intervention in feature extraction.
Training Methods:
- Machine Learning: Supervised, unsupervised, reinforcement, etc.
- Deep Learning: Autoencoders, CNNs, RNNs, etc.
Complexity of Algorithms:
- Diverse algorithms in ML; complex architecture in Deep Learning.
Data Interpretation:
- ML requires structured data; Deep Learning can handle unstructured data.
Infrastructure Needs:
- ML can run on single instances; Deep Learning requires large storage and computational power.

Machine Learning Definition:
- A method for automating data analysis.
- Collects multiple examples to train the system.
Types of Machine Learning Algorithms:
- Supervised Learning: Uses labeled data for training.
- Unsupervised Learning: Discover patterns in unlabeled data.
- Reinforcement Learning: System learns based on feedback (reward/penalty).

Classification: Assigns finite values based on input (e.g., land use patterns).
Regression: Outputs continuous values (numerical weather modeling).
Clustering: Groups similar data points (unsupervised learning).

Training Process:
- Data split into training, validation, and testing sets.
Example of Classification:
- Training using representative pixels for different land use categories.
Disadvantages:
- Reliant on labeled data which can be costly and time-consuming to obtain.

Paralle Classification Algorithm:
- Classifies based on statistical calculations of means and standard deviations.
Minimum Distance to Means Classification:
- Classifies based on the closest mean.
Mahalanobis Decision Rule:
- Considers the covariance of classes to improve classification accuracy.
Maximum Likelihood Classification:
- Uses statistical distribution for classification; highly accurate but computationally intensive.

Decision Trees:
- Series of binary decisions for classification.
Random Forest Classifier:
- Ensemble method using multiple trees to improve accuracy and reduce overfitting.

Classification and Regression Tasks:
- Finds optimal hyperplane to separate classes in high-dimensional space.
Advantages:
- Effective in high-dimensional spaces; memory efficient.

Clustering:
- Groups similar data points without labeled training data.
K-Means Clustering:
- Assigns data points to clusters based on distance from a centroid.

Supervised Learning:
- Requires labeled data; high accuracy and simpler.
Unsupervised Learning:
- No labeled data; less accurate but can discover natural patterns.
Semi-Supervised:
- Combines small labeled data with a larger unlabeled dataset.
Reinforcement Learning:
- Learns through trial and error and feedback.