Lecture Notes: Linear Models and Learning Theory

Overview

Presented by: Yaser Abu-Mostafa
Host: Caltech
Topics Covered: Linear models, error measures, noisy targets, and introductory concepts for learning theory

Definition: Models using linear sums of input variables and weights.
Examples: Perceptron (classification), linear regression (real-valued outputs).
Algorithm: Linear regression uses a simple formula to calculate optimal weights.
- Takes inputs and outputs in matrix form and computes weights in one shot.
Analogy: Economy cars—efficient, simple, and often sufficient.
Strengthening Linear Models: Use of nonlinear transformations.
- Signal is linear in weights (w) but can be nonlinear in inputs (x) after transformation.
- Example: Transformations like x1^2 and x2^2 to make data linearly separable.

Concept: Transform input space (X) into a feature space (Z) using a transformation (ϕ).
Purpose: Allows linear methods to be applied in transformed space, then map results back to the original space.
Process: Conceptual Cycle: Data set -> Transformation -> Classification in Z space -> Interpretation in X space.
Example: X to Z transformations can use complex functions causing linear boundaries in Z to become nonlinear in X.

Objective: Quantify how well hypothesis (h) approximates target function (f).
Error Measure Definition: E(h, f); returns a number indicating approximation accuracy.
Pointwise Definition: Small e defines error based on specific input points.
Examples: Squared error, Binary error.
In-sample Error (E_in): Average error over training set.
Out-of-sample Error (E_out): Expected value of error over entire input space.

Case in Real Life: Targets in real-world problems are often noisy, not deterministic.
Target Distribution: Probability of y given x (P(y|x)), rather than a fixed function.
Noise Model: Target function as E[y|x] (expected value) + noise term.

Components: Target distribution, error measure, training examples, hypothesis, supervised learning flow.
Update: Moving from a deterministic target to probabilistic target distribution introduces practical considerations.

Learning Feasibility: Out-of-sample performance approximates in-sample performance, albeit probabilistically.
Two Key Conditions:
- Generalization: E_in close to E_out.
- Minimization: E_in being small.
Practical and Theoretical Perspectives: Practical algorithms to minimize E_in and theoretical guarantees for E_out.

Upcoming Topics: Deeper dive into learning theory over the next two weeks, focusing on the theoretical underpinnings and practical implementations.
Practical Applications: Importance of understanding both error measures and noisy target functions in real-world applications like fingerprint verification and credit approval.