📊

Understanding General Linear Models in Statistics

Aug 8, 2024

Crash Course Statistics: General Linear Model (GLM)

Introduction

  • Presenter: Adriene Hill
  • Topic: Introduction to the General Linear Model (GLM)
  • Key Point: Flexibility of GLM in creating different models to describe the world

General Linear Model (GLM)

  • Concept: Data can be explained by two things: the model and some error
  • Model Form: Typically Y = mx + b (or Y = b + mx)

Example: Predicting Trick-or-Treaters

  • Baseline: 25 trick-or-treaters
  • Model: Number of trick-or-treaters increases by 0.01 for each middle school student
  • Prediction: 35 trick-or-treaters (based on 1000 middle school students)
  • Reality: 42 trick-or-treaters (Error = 7)
  • Key Point: Error indicates deviation from the model, not necessarily that something is wrong
  • Sources of Error: Unaccounted variables and random variation

Importance of Models

  • Purpose: Allows making inferences (e.g., number of credit card frauds in a year)
  • GLM Components: Information explained by the model and information that can't be explained

Linear Regression

  • Concept: A type of GLM, predicts data using a continuous variable
  • Example: Predicting YouTube video likes based on the number of comments

Steps in Linear Regression

  1. Plot Data: Check if data fits a straight line and look for outliers
  2. Handle Outliers: Decide based on criteria, outliers can influence the regression line
  3. Check Linearity: Ensure the relationship is linear
  4. Fit Regression Model: Usually done by a computer
  5. Interpret Regression Line: Minimizes the sum of squared distances of each point to the line

Components of Regression Line

  • Y-intercept: Expected likes for a video with zero comments (may not make practical sense)
  • Slope (coefficient): Indicates how much likes increase per additional comment
  • Error (Residuals): Differences between observed and predicted values

Residuals

  • Residual Plot: Should ideally be an evenly spaced cloud
  • Concern: Patterns in residuals indicate errors depend on predictor variable values

Statistical Tests: F-test

  • Purpose: Quantify fit of data to a distribution (null hypothesis)
  • Null Hypothesis: No relationship between comments and likes
  • Observed Model vs. Null Model: Compare actual data to the model where null hypothesis is true

F-test Calculation

  1. Notation: Y-hat (predicted value), Y-bar (mean value)
  2. Sum of Squares Total: Total variation in data
  3. Sum of Squares for Regression (SSR): Variation explained by the model
  4. Sum of Squares for Error (SSE): Variation not explained by the model
  5. Degrees of Freedom: Reflects amount of independent information
  6. F-statistic: Compare SSR and SSE to determine significance
  7. P-value: Probability of obtaining F-statistic as large or larger if null hypothesis is true

Conclusion from F-test

  • Result: Reject null hypothesis if p-value is small (significant relationship exists)
  • Comparison to t-test: Equivalent in hypothesis testing, squaring t-statistic gives F-statistic

Applications of Regression

  • Fields: Science, economics, political science
  • Examples: Relationship between taxes and cigarette purchases, heart rate and blood pressure
  • Caution: Regression shows correlation, not causation

Summary

  • GLM Framework: Explains data with model and error
  • Practical Examples: Predicting gas budget, roommate's anger

Next Steps: Further understanding of F-tests in future episodes.