Linear Regression in SPSS

Overview

This lecture covers how to perform linear regression analysis in SPSS Statistics, including assumptions to check, the step-by-step procedure, and interpretation of results.

Introduction to Linear Regression

Linear regression predicts the value of a dependent variable based on an independent variable.
The dependent variable is what we want to predict; the independent variable is the predictor.
For more than one independent variable, use multiple regression.

Assumptions of Linear Regression

The dependent variable must be continuous (interval or ratio scale).
The independent variable must be continuous (interval or ratio scale).
There must be a linear relationship between the variables (check with a scatterplot).
There should be no significant outliers (review using scatterplots and diagnostics).
Observations must be independent (test with Durbin-Watson statistic).
Data should show homoscedasticity (equal variances along the line of best fit).
Regression residuals (errors) should be approximately normally distributed (check with histogram or Normal P-P Plot).
Assumptions #3-#7 can be checked using SPSS tools during analysis.

Example Scenario

Example: Predicting car price (dependent) based on individual income (independent).
Salesperson uses this relationship to suggest cars in areas with known average incomes.

Data Setup in SPSS

Create variables in SPSS: 'Income' (independent), 'Price' (dependent), and optionally, 'caseno' for tracking cases.
'Caseno' helps identify and remove outliers but is not part of the regression analysis.

Procedure for Linear Regression in SPSS

Navigate to Analyze > Regression > Linear in SPSS.
Move 'Income' to Independent(s) and 'Price' to Dependent in the dialogue box.
Use Statistics and Plots options to check for outliers, independence, homoscedasticity, and normality of residuals.
Click OK to run the analysis and generate output tables.

Interpreting SPSS Output

The Model Summary table provides R (correlation) and R² (variance explained by the model).
The ANOVA table tests if the regression model significantly predicts the dependent variable (look for Sig. < 0.05).
The Coefficients table gives the regression equation and indicates if the predictor is statistically significant.
Example regression equation: Price = 8287 + 0.564(Income).

Key Terms & Definitions

Dependent variable — The outcome variable being predicted.
Independent variable — The predictor variable used in regression.
Continuous variable — A variable measured on an interval or ratio scale.
Linearity — The assumption that the relationship between variables is straight-line.
Outlier — A data point far from the predicted value.
Independence of observations — Each data point is not influenced by others.
Homoscedasticity — Equal variance of errors along the regression line.
Residuals — The differences between observed and predicted values.
R² (R squared) — Proportion of variance in the dependent variable explained by the independent variable.

Action Items / Next Steps

Practice entering and analyzing sample data in SPSS using the steps provided.
Review scatterplots, diagnostics, and residual plots to check assumptions before interpreting results.
Write out and interpret your own regression equation based on SPSS output.