Crash Course Statistics: General Linear Models (GLM)

Introduction

Presenter: Adriene Hill
Topic: General Linear Model (GLM)
Key Idea: Flexibility of GLMs in statistical analysis, similar to how a Transformer can change forms.

Basic Concept: Data can be explained by the model and some error.
Model Equation: Generally in the form Y = mx + b (or Y = b + mx)
- Example: Predicting trick-or-treaters based on local middle school enrollment
- Baseline of 25 trick-or-treaters + 0.01 increase per middle school student
- Reality vs. Model: Predicted 35, actual 42, so error = 7
Error: Deviation from the model, not necessarily something 'wrong'
- Sources: Unaccounted variables, random variation

Use: Provides predictions using continuous variables
- Example: Predicting YouTube video likes based on comments
Data Plotting: Visually check for linearity and outliers
- Decision on outliers affects the regression line

Assumption: The relationship is linear
Fitting the Model: Usually done by computers
Regression Line: Minimizes the sum of squared distances of data points to the line
- Equation includes y-intercept (e.g., 9104 likes for 0 comments) and slope (e.g., 6.5 likes per comment)
Residuals/Error: Difference between observed and predicted values
- Ideal: Evenly spaced residuals without patterns

Purpose: Quantifies how well data fits a distribution under the null hypothesis (no relationship)
Null Hypothesis: No relationship between predictor and outcome
- Expected slope of 0 for the regression line
- Scatter plot would look like a blob
**Notation and Calculations: **
- Y-hat: Predicted value
- Y-bar: Mean value
- Total Variation: Sum of Squares Total (variance)
- SSR (Sums of Squares for Regression): Variation explained by the model
- SSE (Sums of Squares for Error): Variation not explained by the model

**Formula Components: **
- Numerator: SSR divided by its degrees of freedom
- Denominator: SSE divided by degrees of freedom
p-value: Found using an F-distribution
- Example: Very small p-value (<0.05) leads to rejecting the null hypothesis

**Relation to Real Life: **
- GLMs explain data through the model and error
- Examples: Budget deviation, predicting roommate's anger
Future Topics: More on the importance of F-tests
Applications: Widely used in various fields (science, economics, political science) to model relationships and make predictions

Remember: Regression shows correlation, not causation.