📊

Inference with Generalized Linear Models

Jul 2, 2024

Inference with Generalized Linear Models (GLMs)

Preliminaries

  • Response variable (y:i): Estimator of the expected value (μ_hat) for observation I.
  • Two Extreme Positions:
    • Null Model: Each response gets the same estimate (global average, denoted as YÌ…). This model is overly simple.
    • Saturated/Full Model: Each response estimated by actual values, leading to overfitting.

Deviance Statistic

  • Likelihood Ratio Test Statistic: Balances regression model against full model.
  • Log Likelihood: Uses parameter θ and mean μ_I for response/observation I, with substitution yielding the log likelihood in terms of Wμ_I.

Specific Regression Model

  • Log Likelihood: L evaluated in μ_hat; compares this with full model.
  • Likelihood Ratio Test Statistic: Compares log likelihood of regression model against saturated model. Formula: -2 * log(L(μ_hat) / L(saturated model)).
  • Scaled Deviance Statistic: Deviance divided by dispersion parameter (Φ).*

Poisson Regression

  • Dispersion Parameter: Fixed, equals 1. Deviance and scaled deviance are the same.
  • General Setting: Dispersion parameter is unknown, requires estimation. Scaled deviance reflects this.

Example: Normal Linear Regression Model

  • Log Likelihood: Function of mean μ_I and variance parameter (σ²).
  • Mean (μ_I): Expressed as Xᵢᵀ * β, using identity link function and dispersion parameter σ².
  • Scaled Deviance Statistic: Sum of squared differences between response variable (Y) and estimate (μ_hat_I), divided by σ².
  • Weighted Sum of Squares: Add weight wáµ¢ for observation-specific weighting.
  • Residual Sum of Squares: Familiar representation of scaled deviants.*

Pearson Statistic (χ²)

  • Definition: Sum of squared differences between response variable (Y) and mean (μ_hat), divided by variance function evaluated in (μ_hat).
  • Example (Normal Linear Regression): Variance is fixed (σ²), Pearson statistic reduces to sum of squared residuals.
  • Distributional Results: Both deviance and Pearson statistics have asymptotic distribution ~ Φ * χ² with degrees of freedom (n - p + 1).*

Drop in Deviance Test

  • Comparison: Null hypothesis model vs alternative hypothesis model.
  • Drop in Deviance Statistic: Difference between scaled deviance from reduced model (null) and larger model (alternative).
  • Likelihood Ratio Test: Compare log likelihoods under null and alternative hypotheses.
  • Decision Making: Compare observed drop in deviance against χ² quantile to accept/reject null hypothesis.
  • Dispersion Parameter Known: Directly compare with χ² distribution.
  • Dispersion Parameter Unknown: Estimate with alternative model deviance scaled by degrees of freedom.
  • F Test Statistic: Ratio of drop in deviance (divided by Q) and estimator of Φ. Compare observed F test value with F distribution quantile.

Types of Residuals in GLMs

  • Assessment: Evaluate model fit, variance assumption, link function choice, covariates inclusion, etc.
  • Pearson Residuals: Difference between response (Y) and fitted value (μ_hat) scaled by √variance function evaluated in fitted value.
  • Deviance Residuals: Sum of squares equals deviance of the model under investigation.
  • Software: Pearson and deviance residuals extractable in statistical software post-calibration of GLM.