Bayesian Linear Regression

Jul 26, 2024

Bayesian Linear Regression

Overview

  • Introduction to Bayesian Linear Regression
  • Discussion on the requests for a statistical model using Bayesian statistics
  • Connection to Ordinary Least Squares (OLS) and regularization methods (LASSO and Ridge)

Linear Regression Framework

Key Concepts

  • Linear model with known data matrix X (n by p)

  • Response variable Y generated by a linear combination of features:

    $$ Y = X \beta + \epsilon $$

  • Goal: solve for beta in the linear model.

Assumptions

  1. Errors ε are normally distributed with mean 0 and variance σ².
  2. Each Y_i is normally distributed with mean β^T X_i and variance σ².

Bias-Variance Tradeoff

  • OLS solution has high variance
  • Small changes in X can lead to large changes in the beta estimates.
  • Regularization methods (LASSO and Ridge) help mitigate this issue by adding penalty terms.

Regularization

LASSO and Ridge

  • LASSO: Uses L1 norm to promote sparsity in beta estimates.
  • Ridge: Uses L2 norm to shrink the values of beta.

Optimization Goals

  1. Minimize errors from predictions (fit to data).
  2. Keep betas small (regularization).

Bayesian Framework

Posterior Probability

  • Focus on probability of beta given y.
  • Aim to maximize the posterior probability (MAP estimate).

Bayes' Theorem

  • Posterior can be expressed as:

    $$ P(\beta | Y) \propto P(Y | \beta) \cdot P(\beta) $$

  • Y is the known data, β is the unknown parameter vector.

  • Simplifying the posterior by focusing on terms dependent on beta.

Likelihood and Prior

  • Likelihood: Probability of observing data given beta.
  • Prior: Probability of observing beta before any data is seen.

Choosing a Prior

  • The choice of prior impacts the outcome and needs careful consideration.
  • Example of a prior: assume 𝛽_j ~ Normal(0, τ²) to emphasize that small beta values are likely.

Connection Between Bayesian and Regularization Methods

  • Regularization effectively keeps beta estimates small, mirroring Bayesian prior influence.
  • The optimization problem derived from Bayesian statistics can yield equivalent results as regularization (e.g., Ridge corresponds to Gaussian prior; LASSO corresponds to Laplacian prior).

Final Thoughts

  • Bayesian approach is a different perspective rather than inherently "better".
  • Questions about the choice of prior and its implications remain valid in Bayesian statistics.

Conclusion

  • Summary of Bayesian linear regression as a powerful lens to approach linear modeling.
  • Encouragement to like and subscribe for more statistical content.