ЁЯзо

Principal Component Analysis (PCA) Overview

Jul 13, 2024

Principal Component Analysis (PCA) Lecture Notes

Review of the Previous Video

  • Main Purpose of PCA: Reduce the dimensionality of data.
  • Benefit: Provide better results for machine learning algorithms.

Today's Learning Goals

  • Explore the mathematical aspects of PCA.
  • Describe the problem and solution.
  • Demonstrate with code.

What Kind of Problem Does PCA Solve?

Example: Data Reduction from 2D to 1D

  • Goal: Project data onto a single linear vector/axis.
  • Formula:
    • Projection Formula:
      Projection Formula
    • Calculate the unit vector (U).
    • Maximize the variance.
  • How to do it:
    • Project all data points onto a unit vector.
    • Calculate the variance for each projection.
    • Choose the unit vector with the maximum variance.

Covariance and Variance

Covariance

  • Definition: Measures the variance of data, showing the relationship between different features.
  • Calculation:
    Covariance Calculation
  • Covariance Matrix:
    • Combination of variances and covariances.
    • Symmetrical.

Eigenvectors and Eigenvalues

Their Importance

  • Eigenvectors: Vectors whose direction remains unchanged when transformed by a matrix.
  • Eigenvalues: Indicates how much the vector will stretch or shrink.
  • Choosing Unique Vectors: Select the vector with the largest eigenvalue in PCA.
    • The eigenvectors with the largest eigenvalues become the principal components.

PCA Step-by-step Process

Problem Formulation

  • Reduce a 3D dataset to 2D.
    1. Centering Data: Subtract the mean.
    2. Covariance Matrix Calculation: Calculate variances and covariance.
    3. Extract Eigenvalues and Eigenvectors: From the covariance matrix.
    4. Projections: Project data onto new axes.

Practical Implementation

  • Prepare the Dataset: A data frame with three columns F1, F2, F3.
  • Plot 3D Data.
  • Centering Data: Using preprocessing.scale.
  • Extract Covariance Matrix: Using Numpy.
  • Eigenvalues and Eigenvectors: Using np.linalg.eigh.
  • Data Projection: Create new data via dot product.

Summary

  • PCA reduces data in a meaningful way for practical applications.
  • Discusses real-world applications of data reduction.