Introduction to Machine Learning with Eric Grimson

Jul 12, 2024

Lecture Notes: Introduction to Machine Learning with Eric Grimson

Announcements

  • Last class in two weeks.

Linear Regression Recap

  • Used for deducing models from data. Example: Spring weights and displacements.
  • Can be linear, quadratic, cubic, etc.

Transition to Machine Learning

  • Final topic of the course.
  • Reading Assignment: Chapter 22.
  • Multi-disciplinary relevance (NLP, computational biology, computer vision, robotics).
  • Not covering advanced methods like convolutional neural nets or deep learning.

Applications of Machine Learning

  • AlphaGo: Beat world-class Go players using ML.
  • Netflix/Amazon: Recommendation systems.
  • Google Ads: Personalized ads using preferences.
  • Drug Discovery, Character Recognition, Hedge Funds, Assistive/Autonomous Driving, Face Recognition, Cancer Diagnosis (IBM Watson).

Definition of Machine Learning

  • Quote by Art Samuel (1959): “Field of study that gives computers the ability to learn without being explicitly programmed.”
  • Difference between traditional programming (explicit) and machine learning (derive program from data).

Essential Concepts in Machine Learning

  • Training Data: Examples with features.
  • Features: Attributes representing examples.
  • Distance: Measure to group similar examples.

Types of Learning

  1. Supervised Learning: With labeled data.
  2. Unsupervised Learning: Without labeled data.

Example: Patriots Players Classification

  • Features: Height and Weight of football players.
  • Unsupervised Learning: Cluster similar examples.
  • Supervised Learning: Use known labels to classify.

Clustering Example

  • Initially pick examples as cluster centers.
  • Iteratively assign examples to clusters and update centers (median, not mean).
  • Euclidean vs. Manhattan distance metrics.

Supervised Learning Example

  • Using labeled data (Receiver, Lineman).
  • Find line or surface that separates classes.
  • Validate with new examples (Handling new Running Backs).

Feature Selection and Engineering

  • Selecting the right features is crucial.
  • Trade-off between signal and noise.
  • Example: Reptiles classification features (scales, cold-blooded, legs, etc.).
  • Avoid overfitting by simplifying feature vectors.

Distance Metrics for Feature Vectors

  • Minkowski Metric: General form.
  • Manhattan Distance (p=1) vs. Euclidean Distance (p=2).
  • Scales and weighing of features matter.

Evaluation and Validation

  • Accuracy: Ratio of correctly labeled instances.
  • Confusion Matrix: True Positives, True Negatives, False Positives, False Negatives.
  • Positive Predictive Value (PPV), Sensitivity, Specificity: Measures to evaluate classifiers.
  • Trade-offs between sensitivity and specificity.

Next Steps in Course

  • Detailed learning models using labeled and unlabeled data.
  • Objective functions and optimization methods.
  • More examples and code explanations to follow.