📊

Understanding General Linear Models in Statistics

Aug 8, 2024

View transcript

Review flashcards

Crash Course Statistics: General Linear Model (GLM)

Introduction

Presenter: Adriene Hill
Topic: Introduction to the General Linear Model (GLM)
Key Point: Flexibility of GLM in creating different models to describe the world

General Linear Model (GLM)

Concept: Data can be explained by two things: the model and some error
Model Form: Typically Y = mx + b (or Y = b + mx)

Example: Predicting Trick-or-Treaters

Baseline: 25 trick-or-treaters
Model: Number of trick-or-treaters increases by 0.01 for each middle school student
Prediction: 35 trick-or-treaters (based on 1000 middle school students)
Reality: 42 trick-or-treaters (Error = 7)
Key Point: Error indicates deviation from the model, not necessarily that something is wrong
Sources of Error: Unaccounted variables and random variation

Importance of Models

Purpose: Allows making inferences (e.g., number of credit card frauds in a year)
GLM Components: Information explained by the model and information that can't be explained

Linear Regression

Concept: A type of GLM, predicts data using a continuous variable
Example: Predicting YouTube video likes based on the number of comments

Steps in Linear Regression

Plot Data: Check if data fits a straight line and look for outliers
Handle Outliers: Decide based on criteria, outliers can influence the regression line
Check Linearity: Ensure the relationship is linear
Fit Regression Model: Usually done by a computer
Interpret Regression Line: Minimizes the sum of squared distances of each point to the line

Components of Regression Line

Y-intercept: Expected likes for a video with zero comments (may not make practical sense)
Slope (coefficient): Indicates how much likes increase per additional comment
Error (Residuals): Differences between observed and predicted values

Residuals

Residual Plot: Should ideally be an evenly spaced cloud
Concern: Patterns in residuals indicate errors depend on predictor variable values

Statistical Tests: F-test

Purpose: Quantify fit of data to a distribution (null hypothesis)
Null Hypothesis: No relationship between comments and likes
Observed Model vs. Null Model: Compare actual data to the model where null hypothesis is true

F-test Calculation

Notation: Y-hat (predicted value), Y-bar (mean value)
Sum of Squares Total: Total variation in data
Sum of Squares for Regression (SSR): Variation explained by the model
Sum of Squares for Error (SSE): Variation not explained by the model
Degrees of Freedom: Reflects amount of independent information
F-statistic: Compare SSR and SSE to determine significance
P-value: Probability of obtaining F-statistic as large or larger if null hypothesis is true

Conclusion from F-test

Result: Reject null hypothesis if p-value is small (significant relationship exists)
Comparison to t-test: Equivalent in hypothesis testing, squaring t-statistic gives F-statistic

Applications of Regression

Fields: Science, economics, political science
Examples: Relationship between taxes and cigarette purchases, heart rate and blood pressure
Caution: Regression shows correlation, not causation

Summary

GLM Framework: Explains data with model and error
Practical Examples: Predicting gas budget, roommate's anger

Next Steps: Further understanding of F-tests in future episodes.

Full transcript