Multiple Regression Overview

Jul 15, 2024

Multiple Regression Lecture Notes

Introduction

  • Multiple regression: Used to predict a variable using more than one predictor variable.
  • Assumes familiarity with simple linear regression.

Problem Setup

  • Scenario: Regional Delivery Service (RDS) estimates delivery time.
  • Data Set: Sample of 10 past trips, recording:
    • Total miles traveled (X1)
    • Number of deliveries (X2)
    • Total travel time in hours (Y)
  • Goal: Predict total travel time (Y) using miles traveled (X1) and number of deliveries (X2).

Key Concepts

Variables

  • Dependent Variable (Y): Travel time
  • Independent Variables (X1, X2): Miles traveled and number of deliveries

Terminology

  • Predictor Variables: Independent variables
  • Response Variable: Dependent variable
  • Stick with terms: Independent variable and Dependent variable.

Multiple Regression Overview

  • Extension of Simple Linear Regression: Uses multiple independent variables to predict a single dependent variable.
    • Simple Linear Regression: 1-to-1 relationship (1 X, 1 Y)
    • Multiple Regression: Many-to-1 relationship (multiple Xs, 1 Y)

Important Considerations

  • Overfitting: Adding too many independent variables can degrade model performance.
    • Always explains more variation but can be misleading.
    • Ideal: Select the best variables for the model.
  • Multicollinearity: When independent variables are correlated with each other.
    • Ideal: Independent variables are correlated with the dependent variable but not with each other.

Preparations for Multiple Regression

  • Pre-work: Conduct proper data analysis before running multiple regression.
    • Use correlations, scatter plots, and simple regressions.

Examples and Equations

  • Simple Example: Predicting travel time using miles traveled and number of deliveries.
  • Multiple Relationships: Account for dependent-independent relationships and inter-relationships among independent variables.

Multiple Regression Equation

  • General Form:
    • $Y = β_0 + β_1X_1 + β_2X_2 + ... + β_pX_p + ε$
    • Sum of linear parameters and an error term (ε).
  • Estimated Multiple Regression Equation: Used with sample data.
    • $ \hat{Y} = b_0 + b_1X_1 + b_2X_2 + ... + b_pX_p$
    • Note: No error term (ε = 0)

Interpreting Coefficients

  • Coefficients indicate the estimated change in Y for a one-unit change in X, holding other variables constant.
  • Example:
    • $ \hat{Y} = 27 + 9X_1 + 12X_2$
    • $9X_1$: $9,000 increase in Y for $1,000 increase in X1 (Capital Investment)
    • $12X_2$: $12,000 increase in Y for $1,000 increase in X2 (Marketing Expenditures)

Summary

  • Multiple Regression: Predicts a dependent variable using multiple independent variables.
  • Be wary of overfitting and multicollinearity.
  • Interpretation of Coefficients: Change in Y for a one-unit change in X, other variables constant.
  • Next Steps: Detailed exploration of variable relationships and pre-work before conducting multiple regression.