Understanding Pearson's Correlation Coefficient

Aug 14, 2024

Lecture Notes on Correlation Coefficient - Pearson's R

Introduction

  • Previous discussions involved interpreting graphs using descriptive terms like:
    • Strong positive association
    • Moderate negative association
    • Weak negative association
    • No association
  • Transition from descriptive terms to numerical representation using correlation coefficient: Pearson's r

Pearson's r: Overview

  • Pearson's r is a correlation coefficient that quantifies the degree of association between two variables in a scatter plot.
  • Range: Between -1 and 1
    • 1: Perfect positive correlation
    • -1: Perfect negative correlation
    • 0: No association

Understanding Correlation

  • Perfect Positive Correlation
    • Example: Cost of bananas based on weight purchased (linear relationship)
  • Perfect Negative Correlation
    • Example: Distance from home decreases as you drive toward home at constant speed

Intervals of Correlation Coefficients

  • No Association: r between -0.25 and 0.25
  • Weak Positive: r between 0.25 and 0.5
  • Weak Negative: r between -0.25 and -0.5
  • Moderate Positive: r between 0.5 and 0.75
  • Moderate Negative: r between -0.5 and -0.75
  • Strong Positive: r between 0.75 and 1
  • Strong Negative: r between -0.75 and -1

Describing Associations

  • Given r = 0.6
    • Classified as moderate positive
    • Sketch: Dots moving upwards, slightly spread out

Estimating r from a Graph

  • If the graph trends downwards and dots are close, r might be approximately -0.7

Linear vs Non-linear Associations

  • Pearson's r is only meaningful for linear associations.
  • Non-linear patterns in data yield meaningless r values.

Calculating Pearson's r

  • Complex Calculation
    • Involves calculating average coordinates, differences, standard deviations, and standardized areas
  • Steps:
    • Find average x and y coordinates
    • Measure distances from averages
    • Standardize using standard deviation
    • Multiply standardized values for each dot
    • Sum these products and divide to find r

Practical Use

  • Calculation by hand is complex and rare
  • Use calculators or software like Excel for efficient computation

Conclusion

  • Understanding the intuition behind r calculation shows why certain quadrants of a scatter plot contribute positive or negative values
  • Calculators and software tools simplify the determination of Pearson's r