Coconote
AI notes
AI voice & video notes
Try for free
🌀
Understanding Spurious Correlations in Data
Dec 5, 2024
Lecture Notes: Spurious Correlations
Introduction to Spurious Correlations
Correlation is not causation
The idea that just because two variables show a correlation does not mean one causes the other.
Examples of Spurious Correlations
Elijah Wood Movies and Orderlies in Oklahoma
Charts show correlation between these two unrelated variables.
GenAI's humorous explanation suggests Middle Earth's influence on healthcare.
Air Pollution in San Diego and the Popularity of 'Kirk'
Close correlation tracked over time.
AI's explanation humorously connects pollution with naming trends.
Petroleum Consumption in Azerbaijan and Farm Equipment Mechanics in Alabama
Demonstrates how unrelated variables show correlation.
Explanation involves a fictional economic chain reaction.
Master's Degrees in Education and Google Searches for 'Gangnam Style'
Decrease in degrees correlates with less interest in the dance.
Fictional explanation ties educators' dance skills to trend.
Detailed Explanations
Why Spurious Correlations Occur
Data Dredging
Large datasets result in random, strong correlations.
Example: Comparing 25,237 variables can yield random matches.
Lack of Causal Connection
No direct link between variables, though might seem so.
Example: Using 'years' as a variable creates unrelated connections.
Observations Not Independent
Trend lines can form as sequential years aren't independent.
P-values might not reflect reality.
Y-Axes Doesn’t Start at Zero
Graphs can visually deceive.
Line graphs emphasize connections more than warranted.
Confounding Variables
Third variables might influence both variables.
Outliers
Anomalous data points can skew correlation strength.
Low n (Sample Size)
Few data points lead to misleading correlations.
Analysis Tools and Insights
Links and methods provided to delve deeper into correlations.
Python code available for personal calculation.
Closing Thoughts
Spurious correlations highlight the importance of careful analysis and the potential pitfalls in interpreting data visually or statistically.
🔗
View note source
https://www.tylervigen.com/spurious-correlations