Generalized Linear Models Lecture Notes

Jul 22, 2024

Lecture on Generalized Linear Models (GLMs)

Overview

  • Generalized Linear Models (GLMs) generalize linear models in two primary ways:
    1. Allowing for different distributions for the response variables (exponential family distributions).
    2. Fitting a linear predictor to a transformed scale, suitable for the data distribution.

Exponential Family

  • Exponential family: A set of probability distributions, can be defined over various domains.
  • Focusing on the case where the response variable (Y) and parameter (θ) are real-valued.
  • Canonical Case: The density function is in the form of an exponential function involving θ and Y along with other terms.

Properties of the Canonical Exponential Family

  • Contains distributions like Gaussian, Poisson, and Bernoulli.
  • Normalization factor ensures the function integrates to 1.
  • Function b(θ) characterizes the distribution, encoding information such as mean and variance.
  • Derivatives: Expectation of the derivative of the log-likelihood with respect to θ is zero.
  • Variances: Second derivative of the log-likelihood provides variance information.

Likelihood Functions

  • Log-likelihood for one observation: Log of the density function.
  • Log-likelihood for n independent observations: Sum of individual log-likelihoods, accounting for conditional independence.
  • Use of expectations and integrals to prove variance properties.

Conditional Distributions and Regression

  • GLMs focus on the conditional distribution of Y given X, using a systematic component (linear predictor).
  • Link function (g): Connects the linear predictor (Xβ) to the expected value (μ) of the distribution.
  • Inverse link function (g⁻¹): Maps the linear predictor back to the expected value.

Common Link Functions

  • Identity link: Used in linear regression, assumes no transformation is needed.
  • Log link: Common for Poisson distributions (μ > 0).
  • Logit link (log(μ / (1-μ))): Converts probabilities (0,1) to real line, suitable for Bernoulli distribution.
  • Probit link: Uses the inverse of the normal CDF.
  • Complementary log-log: Another option for binary outcomes.

Canonical Links

  • Canonical link: Directly maps the systematic component to the canonical parameter θ of the exponential family.
  • Provides a natural choice for link function when modeling specific distributions.

Examples

  • Bernoulli distribution: PMF can be expressed in canonical form; canonical link is the logit function.
  • Poisson distribution: PMF expressed using log link.
  • Gamma distribution and others have respective canonical links derived from their exponential family representations.

Generalized Linear Models (GLMs) Summary

  • GLMs extend linear models to accommodate different distributions and link functions suitable for data properties.
  • Using canonical links simplifies the modeling process by leveraging properties of the exponential family.
  • Log-likelihood and derivatives provide key insights into distribution characteristics and aid in parameter estimation.

Detailed Example: Poisson Distribution

  • PMF: P(Y=y) = (e^(-µ) * µ^y) / y!, where µ = θ in canonical form.
  • Transformation: Can be restated in canonical form, simplifies expectation and variance calculations.

Beta as the Key Parameter

  • β parameter: From β to μ to θ in an organized chain, simplifying the process of modeling and interpreting relationships.
  • Focus on the log-likelihood as a function of β when maximizing likelihood.
  • Simplified expressions and avoidance of complex integrals.

Practical Implementation

  • Iterative approaches like Newton-Raphson method and iteratively reweighted least squares for parameter estimation.

Upcoming Topics

  • Review the concepts and practice using convex optimization principles for parameter estimation.
  • Advanced discussions on newer topics post-1975 in statistics and general Q&A.