Introduction to Machine Learning Overview

Jun 21, 2024

Introduction to Machine Learning

Differences from Traditional Rule-Based Programming

  • Traditional Rule-Based Programming:

    • Uses hard-coded rules formulated by domain experts.
    • Rules are applied to existing data to generate answers.
    • Example: Arthur Samuel's approach at IBM.
  • Machine Learning Approach:

    • Data on one side, a machine learning algorithm on the other.
    • Algorithm finds rules and creates a learned model from data.
    • Model generalizes to new, unseen data.

Key Definitions and Concepts

  • Tom Mitchell’s Definition:

    • A computer program learns from experience (E) with respect to some tasks (T) and performance measure (P).
    • Improvement in task performance with experience.
  • Important Ingredients:

    • Data (Features)
    • Algorithm or task we want to perform.

Understanding Features

  • Denoted as:

    • Lowercase x (single feature) or uppercase X (collection of features).
  • Examples:

    • Tabular Data: Numeric and string values like years of experience, age, gender.
    • Image Data: Pixel intensities, stored in folders.
    • Language Data: Text snippets or documents, used for tasks like translation.
    • Multimodal Data: Combination of various data types.

Labels

  • Denoted as: Lowercase y
  • Types of Labels:
    • Numerical Labels: Continuous values (e.g., insurance price prediction)
    • Categorical Labels: Classes like disease type, yes/no

Algorithms and Predictions

  • Weighted Combination of Features:

    • Predictive models combine features with learned weights to generate predictions.
    • Example: Predicting a healthcare score.
  • Training Stage:

    • Historical data is used to learn the weights (Model parameters).
    • Weights can be positive, negative, large, or small.

Linear Regression

  • Example: Predicting healthcare score based on historical data.
  • Equation: Y hat = 200 + 100x1 - 25x4
  • Significance: Linear regression is foundational to many machine learning models, including neural networks.

Types of Machine Learning

  • Supervised Learning:

    • Data comes with labels (numerical or categorical).
    • Examples: Regression (numeric prediction), Classification (categorical prediction)
  • Unsupervised Learning:

    • No label data, focuses on finding patterns.
    • Example: Clustering

Problem Types and Algorithms

  • Regression:

    • Linear regression, k-nearest neighbors, neural networks, decision trees.
    • Example: Insurance price prediction.
  • Classification:

    • Support vector machines, neural networks, decision trees.
    • Example: Disease type prediction.
  • Clustering:

    • Principal component analysis (PCA), collaborative filtering, k-means clustering.
    • Example: Skill and experience grouping.

Deep Dive into Problem Types

  • Regression:

    • Predict continuous values, e.g., healthcare score.
    • Visualizes with features versus predicted values.
  • Classification:

    • Predict categories (yes/no, types), visualizes with decision boundaries.
  • Clustering:

    • Find patterns within datasets without labels.
    • Example: K-means clustering determines clusters of similar data points.