Overview
This lecture explains why correlation between two variables does not mean that one causes the other, using five real-world examples.
Concept: Correlation vs. Causation
- Correlation is when two variables move together, but this does not prove that one causes the other.
- Assuming causation from correlation can lead to incorrect conclusions about data relationships.
Example 1: Ice Cream Sales & Shark Attacks
- Monthly ice cream sales and shark attacks are highly correlated.
- Both increase in warmer months due to more people at the beach, not because one causes the other.
Example 2: Masters Degrees vs. Box Office Revenue
- The number of Masters degrees and box office revenue rise together over time.
- This is likely because both are influenced by a growing global population, not by direct causation.
Example 3: Pool Drownings vs. Nuclear Energy Production
- Pool drownings and nuclear energy production both increase over years.
- The increase results from population growth, not one causing the other.
Example 4: Measles Cases vs. Marriage Rate
- Measles cases and marriage rates decline at the same time.
- The declines are independent: medical advances reduce measles, social changes reduce marriages.
Example 5: High School Graduates vs. Pizza Consumption
- High school graduates and pizza consumption numbers both grow over time.
- Both trends are explained by an increasing U.S. population, not a causal link.
Key Terms & Definitions
- Correlation — A statistical relationship showing that two variables move together.
- Causation — When a change in one variable directly produces a change in another.
- Third Variable — An outside influence that explains the correlation between two variables.
Action Items / Next Steps
- Review tutorials on correlation, causation, and related statistical concepts for deeper understanding.
- Practice identifying possible third variables or alternate explanations in data sets.