Notes for Mathematics for Machine Learning

Jul 19, 2024

Mathematics for Machine Learning Specialization

Introduction

  • Machine learning uses mathematical tools like matrix notation & calculus
  • Specialization aims to build intuition, not focus on heavy details
  • Ideal for getting confidence & motivation to explore applied ML courses
  • Designed for accessibility beyond computer scientists

Course 1: Linear Algebra

Module 1: Vectors and Spaces

  • Vectors: Objects that move us around space, can be data attributes
    • Vector Addition: v + u = w
    • Scalar Multiplication: c * v = w
  • Dot Product: Measures angle similarity; commutative, distributive over addition
  • Magnitudes: Length of a vector via the dot product

Module 2: Matrices

  • Matrix Transformation: Rotations, scalings, reflections
  • Matrix Compositions: Successive transformations
    • Associative but not commutative
  • Inverse Matrices: Matrix A where A * A^-1 = I (identity matrix)
  • Determinants: Scalar value representing matrix’s scaling effect
    • Zero determinant means matrix has no inverse

Module 3: Linear Equations

  • Gaussian Elimination: Solve linear systems by transforming to row echelon form
  • Inverse Calculation: Utilize row operations for matrix inversion
  • Special Cases: Handling of non-invertible matrices

Module 4: Eigenvalues and Eigenvectors

  • Eigenvalues/Vectors: Vectors unchanged by certain transformations
  • Diagonalization: Simplifies raising matrices to powers, T^n = C D^n C^-1
    • Useful for computational optimizations

Module 5: Applications

  • Google PageRank: Ranks web pages using link probabilities & eigenvalue problems

Course 2: Multivariate Calculus

Module 1: Fundamentals of Calculus

  • Derivatives: Measure of how function changes, computed via rise/run
  • Rules: Sum rule, Power rule, Product rule, Chain rule
    • Example: f(x) = x^2, f'(x) = 2x
  • Special Cases: 1/x, e^x, trigonometric functions

Module 2: Multivariate Calculus

  • Partial Derivatives: Deals with multi-variable functions
  • Jacobian Matrix: Vector of first-order partial derivatives
  • Hessian: Matrix of second-order partial derivatives, helps find extrema
    • Critical Points: Min/Max points where gradient is zero

Module 3: Optimization

  • Gradient Descent: Method to find local minima, uses gradient vectors
  • Lagrange Multipliers: Optimize functions with constraints

Module 4: Power Series

  • Taylor Series: Approximates functions using polynomials
    • f(x) ≈ f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! +...
  • Linearization: Simplified approximation, considering higher-order terms as error

Module 5: Numerical Methods

  • Finite Differences: Numerical differentiation using discrete points
  • Data Fitting: Using least squares for linear & non-linear models

Course 3: Principal Component Analysis (PCA)

Module 1: Statistical Concepts

  • Mean & Variance: Describes data spread
  • Covariance: Measures dependency between variables
    • Covariance Matrix: Represents variances & covariances in multiple dimensions

Module 2: Inner Products & Geometry

  • Inner Product: Generalizes dot product to compute lengths & angles
    • Basis for defining orthogonality
  • Orthogonal Projections: Projects data onto lower-dimensional spaces

Module 3: PCA Derivation

  • Objectives: Minimize squared reconstruction error
    • Find principal subspace retaining maximum variance
  • Eigen Decomposition: Uses data covariance matrix
    • Principal subspace basis from eigenvectors of largest eigenvalues

Module 4: Practical PCA

  • Preprocessing: Centering data, scaling by standard deviation
  • Algorithm Steps: Compute covariance, eigen decomposition, project data