📊

Understanding Linear Regression and Outliers

Dec 11, 2024

Lecture Notes: Linear Regression and Outliers

Introduction

  • Continuing from the previous session.
  • Focus on calculating a p-value for a slope in linear regression.
  • Testing the null hypothesis that ( \beta_1 = 0 ).
  • If ( \beta_1 = 0 ), then the line is flat, meaning no need for a linear model.

P-Value Calculation

  • Confidence Interval: Zero not within interval, indicates significance.
  • Test Statistic (t):
    • Formula: ( t = \frac{(b_1 - \beta_1)}{SE_{b_1}} )
    • Example calculation using ( b_1 = 0 ) and SE = 0.7 gives a large test statistic._

P-Value and Significance

  • Critical T: 1.96, large test statistic indicates significance.
  • Two-Tailed Test:
    • P-value calculated using function pt.
    • Result implies ( p < 0.001 ).

Linear Regression Overview

  • Ordinary Least Squares: Basis for simple linear regression.
  • Single Predictor: Unlike multiple regression which involves more.

Practical Application in R

  • Manual Calculation vs. R Functions:
    • R allows storing models as objects (e.g., 'Bob').
    • Using lm() and summary() functions.
    • Provides coefficients, p-values, and more.
  • Confidence Intervals: Use confint() function.
  • Accessing Model Data: Residuals, fitted values, etc.

Introduction to Outliers

  • New Example with realistic data containing outliers.
  • Outliers: Discussed in context of leverage and regression influence.
  • Finite Sample Breakdown Point:
    • Mean's breakdown point is ( \frac{1}{n} ).
    • Median's breakdown point is 0.5.

Detecting Outliers

  • Avoid subjective methods like eyeballing or simple standard deviation rules.
  • MAD (Median Absolute Deviation): Robust measure of spread.
    • Steps: Calculate median deviations, absolute values, then find median.
    • MADN: Adjusted MAD for normal distribution.

Outlier Detection Rule

  • Formula: ( \left| \frac{x - ext{median}}{ ext{MADN}} \right| > 2.24 )
  • Hample's Identifier: 2.24 is a constant threshold for outliers.

Wrap-Up

  • Next session will cover implementing outlier detection in R using these concepts.
  • Importance of understanding outlier effects on linear regression.