🌀

Understanding Spurious Correlations in Data

Dec 5, 2024

Lecture Notes: Spurious Correlations

Introduction to Spurious Correlations

  • Correlation is not causation
    • The idea that just because two variables show a correlation does not mean one causes the other.

Examples of Spurious Correlations

  • Elijah Wood Movies and Orderlies in Oklahoma
    • Charts show correlation between these two unrelated variables.
    • GenAI's humorous explanation suggests Middle Earth's influence on healthcare.
  • Air Pollution in San Diego and the Popularity of 'Kirk'
    • Close correlation tracked over time.
    • AI's explanation humorously connects pollution with naming trends.
  • Petroleum Consumption in Azerbaijan and Farm Equipment Mechanics in Alabama
    • Demonstrates how unrelated variables show correlation.
    • Explanation involves a fictional economic chain reaction.
  • Master's Degrees in Education and Google Searches for 'Gangnam Style'
    • Decrease in degrees correlates with less interest in the dance.
    • Fictional explanation ties educators' dance skills to trend.

Detailed Explanations

Why Spurious Correlations Occur

  1. Data Dredging
    • Large datasets result in random, strong correlations.
    • Example: Comparing 25,237 variables can yield random matches.
  2. Lack of Causal Connection
    • No direct link between variables, though might seem so.
    • Example: Using 'years' as a variable creates unrelated connections.
  3. Observations Not Independent
    • Trend lines can form as sequential years aren't independent.
    • P-values might not reflect reality.
  4. Y-Axes Doesn’t Start at Zero
    • Graphs can visually deceive.
    • Line graphs emphasize connections more than warranted.
  5. Confounding Variables
    • Third variables might influence both variables.
  6. Outliers
    • Anomalous data points can skew correlation strength.
  7. Low n (Sample Size)
    • Few data points lead to misleading correlations.

Analysis Tools and Insights

  • Links and methods provided to delve deeper into correlations.
  • Python code available for personal calculation.

Closing Thoughts

  • Spurious correlations highlight the importance of careful analysis and the potential pitfalls in interpreting data visually or statistically.