Notes for Mathematics for Machine Learning

Jul 19, 2024

Mathematics for Machine Learning Specialization

Introduction

Machine learning uses mathematical tools like matrix notation & calculus
Specialization aims to build intuition, not focus on heavy details
Ideal for getting confidence & motivation to explore applied ML courses
Designed for accessibility beyond computer scientists

Course 1: Linear Algebra

Module 1: Vectors and Spaces

Vectors: Objects that move us around space, can be data attributes
- Vector Addition: v + u = w
- Scalar Multiplication: c * v = w
Dot Product: Measures angle similarity; commutative, distributive over addition
Magnitudes: Length of a vector via the dot product

Module 2: Matrices

Matrix Transformation: Rotations, scalings, reflections
Matrix Compositions: Successive transformations
- Associative but not commutative
Inverse Matrices: Matrix A where A * A^-1 = I (identity matrix)
Determinants: Scalar value representing matrix’s scaling effect
- Zero determinant means matrix has no inverse

Module 3: Linear Equations

Gaussian Elimination: Solve linear systems by transforming to row echelon form
Inverse Calculation: Utilize row operations for matrix inversion
Special Cases: Handling of non-invertible matrices

Module 4: Eigenvalues and Eigenvectors

Eigenvalues/Vectors: Vectors unchanged by certain transformations
Diagonalization: Simplifies raising matrices to powers, T^n = C D^n C^-1
- Useful for computational optimizations

Module 5: Applications

Google PageRank: Ranks web pages using link probabilities & eigenvalue problems

Course 2: Multivariate Calculus

Module 1: Fundamentals of Calculus

Derivatives: Measure of how function changes, computed via rise/run
Rules: Sum rule, Power rule, Product rule, Chain rule
- Example: f(x) = x^2, f'(x) = 2x
Special Cases: 1/x, e^x, trigonometric functions

Module 2: Multivariate Calculus

Partial Derivatives: Deals with multi-variable functions
Jacobian Matrix: Vector of first-order partial derivatives
Hessian: Matrix of second-order partial derivatives, helps find extrema
- Critical Points: Min/Max points where gradient is zero

Module 3: Optimization

Gradient Descent: Method to find local minima, uses gradient vectors
Lagrange Multipliers: Optimize functions with constraints

Module 4: Power Series

Taylor Series: Approximates functions using polynomials
- f(x) ≈ f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! +...
Linearization: Simplified approximation, considering higher-order terms as error

Module 5: Numerical Methods

Finite Differences: Numerical differentiation using discrete points
Data Fitting: Using least squares for linear & non-linear models

Course 3: Principal Component Analysis (PCA)

Module 1: Statistical Concepts

Mean & Variance: Describes data spread
Covariance: Measures dependency between variables
- Covariance Matrix: Represents variances & covariances in multiple dimensions

Module 2: Inner Products & Geometry

Inner Product: Generalizes dot product to compute lengths & angles
- Basis for defining orthogonality
Orthogonal Projections: Projects data onto lower-dimensional spaces

Module 3: PCA Derivation

Objectives: Minimize squared reconstruction error
- Find principal subspace retaining maximum variance
Eigen Decomposition: Uses data covariance matrix
- Principal subspace basis from eigenvectors of largest eigenvalues

Module 4: Practical PCA

Preprocessing: Centering data, scaling by standard deviation
Algorithm Steps: Compute covariance, eigen decomposition, project data

Full transcript