Overview
This lecture covers how to perform linear regression analysis in SPSS Statistics, including assumptions to check, the step-by-step procedure, and interpretation of results.
Introduction to Linear Regression
- Linear regression predicts the value of a dependent variable based on an independent variable.
- The dependent variable is what we want to predict; the independent variable is the predictor.
- For more than one independent variable, use multiple regression.
Assumptions of Linear Regression
- The dependent variable must be continuous (interval or ratio scale).
- The independent variable must be continuous (interval or ratio scale).
- There must be a linear relationship between the variables (check with a scatterplot).
- There should be no significant outliers (review using scatterplots and diagnostics).
- Observations must be independent (test with Durbin-Watson statistic).
- Data should show homoscedasticity (equal variances along the line of best fit).
- Regression residuals (errors) should be approximately normally distributed (check with histogram or Normal P-P Plot).
- Assumptions #3-#7 can be checked using SPSS tools during analysis.
Example Scenario
- Example: Predicting car price (dependent) based on individual income (independent).
- Salesperson uses this relationship to suggest cars in areas with known average incomes.
Data Setup in SPSS
- Create variables in SPSS: 'Income' (independent), 'Price' (dependent), and optionally, 'caseno' for tracking cases.
- 'Caseno' helps identify and remove outliers but is not part of the regression analysis.
Procedure for Linear Regression in SPSS
- Navigate to Analyze > Regression > Linear in SPSS.
- Move 'Income' to Independent(s) and 'Price' to Dependent in the dialogue box.
- Use Statistics and Plots options to check for outliers, independence, homoscedasticity, and normality of residuals.
- Click OK to run the analysis and generate output tables.
Interpreting SPSS Output
- The Model Summary table provides R (correlation) and R² (variance explained by the model).
- The ANOVA table tests if the regression model significantly predicts the dependent variable (look for Sig. < 0.05).
- The Coefficients table gives the regression equation and indicates if the predictor is statistically significant.
- Example regression equation: Price = 8287 + 0.564(Income).
Key Terms & Definitions
- Dependent variable — The outcome variable being predicted.
- Independent variable — The predictor variable used in regression.
- Continuous variable — A variable measured on an interval or ratio scale.
- Linearity — The assumption that the relationship between variables is straight-line.
- Outlier — A data point far from the predicted value.
- Independence of observations — Each data point is not influenced by others.
- Homoscedasticity — Equal variance of errors along the regression line.
- Residuals — The differences between observed and predicted values.
- R² (R squared) — Proportion of variance in the dependent variable explained by the independent variable.
Action Items / Next Steps
- Practice entering and analyzing sample data in SPSS using the steps provided.
- Review scatterplots, diagnostics, and residual plots to check assumptions before interpreting results.
- Write out and interpret your own regression equation based on SPSS output.