Bayesian Linear Regression

Jul 26, 2024

Bayesian Linear Regression

Overview

Introduction to Bayesian Linear Regression
Discussion on the requests for a statistical model using Bayesian statistics
Connection to Ordinary Least Squares (OLS) and regularization methods (LASSO and Ridge)

Linear Regression Framework

Key Concepts

Linear model with known data matrix X (n by p)
Response variable Y generated by a linear combination of features:

$$ Y = X \beta + \epsilon $$
Goal: solve for beta in the linear model.

Assumptions

Errors ε are normally distributed with mean 0 and variance σ².
Each Y_i is normally distributed with mean β^T X_i and variance σ².

Bias-Variance Tradeoff

OLS solution has high variance
Small changes in X can lead to large changes in the beta estimates.
Regularization methods (LASSO and Ridge) help mitigate this issue by adding penalty terms.

Regularization

LASSO and Ridge

LASSO: Uses L1 norm to promote sparsity in beta estimates.
Ridge: Uses L2 norm to shrink the values of beta.

Optimization Goals

Minimize errors from predictions (fit to data).
Keep betas small (regularization).

Bayesian Framework

Posterior Probability

Focus on probability of beta given y.
Aim to maximize the posterior probability (MAP estimate).

Bayes' Theorem

Posterior can be expressed as:

$$ P(\beta | Y) \propto P(Y | \beta) \cdot P(\beta) $$
Y is the known data, β is the unknown parameter vector.
Simplifying the posterior by focusing on terms dependent on beta.

Likelihood and Prior

Likelihood: Probability of observing data given beta.
Prior: Probability of observing beta before any data is seen.

Choosing a Prior

The choice of prior impacts the outcome and needs careful consideration.
Example of a prior: assume 𝛽_j ~ Normal(0, τ²) to emphasize that small beta values are likely.

Connection Between Bayesian and Regularization Methods

Regularization effectively keeps beta estimates small, mirroring Bayesian prior influence.
The optimization problem derived from Bayesian statistics can yield equivalent results as regularization (e.g., Ridge corresponds to Gaussian prior; LASSO corresponds to Laplacian prior).

Final Thoughts

Bayesian approach is a different perspective rather than inherently "better".
Questions about the choice of prior and its implications remain valid in Bayesian statistics.

Conclusion

Summary of Bayesian linear regression as a powerful lens to approach linear modeling.
Encouragement to like and subscribe for more statistical content.

Full transcript