Generalized Linear Models Lecture Notes

Jul 22, 2024

Lecture on Generalized Linear Models (GLMs)

Overview

Generalized Linear Models (GLMs) generalize linear models in two primary ways:
1. Allowing for different distributions for the response variables (exponential family distributions).
2. Fitting a linear predictor to a transformed scale, suitable for the data distribution.

Exponential Family

Exponential family: A set of probability distributions, can be defined over various domains.
Focusing on the case where the response variable (Y) and parameter (θ) are real-valued.
Canonical Case: The density function is in the form of an exponential function involving θ and Y along with other terms.

Properties of the Canonical Exponential Family

Contains distributions like Gaussian, Poisson, and Bernoulli.
Normalization factor ensures the function integrates to 1.
Function b(θ) characterizes the distribution, encoding information such as mean and variance.
Derivatives: Expectation of the derivative of the log-likelihood with respect to θ is zero.
Variances: Second derivative of the log-likelihood provides variance information.

Likelihood Functions

Log-likelihood for one observation: Log of the density function.
Log-likelihood for n independent observations: Sum of individual log-likelihoods, accounting for conditional independence.
Use of expectations and integrals to prove variance properties.

Conditional Distributions and Regression

GLMs focus on the conditional distribution of Y given X, using a systematic component (linear predictor).
Link function (g): Connects the linear predictor (Xβ) to the expected value (μ) of the distribution.
Inverse link function (g⁻¹): Maps the linear predictor back to the expected value.

Common Link Functions

Identity link: Used in linear regression, assumes no transformation is needed.
Log link: Common for Poisson distributions (μ > 0).
Logit link (log(μ / (1-μ))): Converts probabilities (0,1) to real line, suitable for Bernoulli distribution.
Probit link: Uses the inverse of the normal CDF.
Complementary log-log: Another option for binary outcomes.

Canonical Links

Canonical link: Directly maps the systematic component to the canonical parameter θ of the exponential family.
Provides a natural choice for link function when modeling specific distributions.

Examples

Bernoulli distribution: PMF can be expressed in canonical form; canonical link is the logit function.
Poisson distribution: PMF expressed using log link.
Gamma distribution and others have respective canonical links derived from their exponential family representations.

Generalized Linear Models (GLMs) Summary

GLMs extend linear models to accommodate different distributions and link functions suitable for data properties.
Using canonical links simplifies the modeling process by leveraging properties of the exponential family.
Log-likelihood and derivatives provide key insights into distribution characteristics and aid in parameter estimation.

Detailed Example: Poisson Distribution

PMF: P(Y=y) = (e^(-µ) * µ^y) / y!, where µ = θ in canonical form.
Transformation: Can be restated in canonical form, simplifies expectation and variance calculations.

Beta as the Key Parameter

β parameter: From β to μ to θ in an organized chain, simplifying the process of modeling and interpreting relationships.
Focus on the log-likelihood as a function of β when maximizing likelihood.
Simplified expressions and avoidance of complex integrals.

Practical Implementation

Iterative approaches like Newton-Raphson method and iteratively reweighted least squares for parameter estimation.

Upcoming Topics

Review the concepts and practice using convex optimization principles for parameter estimation.
Advanced discussions on newer topics post-1975 in statistics and general Q&A.

Full transcript