📊

Insights on Statistics and Simpson's Paradox

May 30, 2025

Lecture Notes: Understanding Statistics and Simpson's Paradox

Importance of Statistics

  • Statistics are highly influential; they guide decision-making for individuals, organizations, and even countries.
  • However, statistics can be misleading if not properly interpreted due to underlying factors.

Simpson's Paradox

  • Defined as a phenomenon where a trend appears in several different groups of data but disappears or reverses when these groups are combined.
  • Occurs due to the presence of a lurking variable, a hidden factor influencing results.

Example: Hospital Choice

  • Scenario: Choosing between two hospitals for surgery based on survival rates.
    • Hospital A has a higher overall survival rate.
    • Upon division into subgroups (good and poor health on arrival), Hospital B shows better survival rates in both categories.
  • Conclusion: Hospital A's better overall rate is misleading due to the different proportions of patients' health on arrival.

Real-World Examples

  • Smokers vs. Non-Smokers:
    • A UK study showed smokers had a higher survival rate than non-smokers.
    • When divided by age, results showed non-smokers lived longer and thus the age was a lurking variable.
  • Death Penalty Cases in Florida:
    • Initially showed no racial disparity in sentences.
    • Dividing by the race of the victim revealed racial disparities, with black defendants more likely to receive death sentences.

Avoiding Simpson's Paradox

  • No definitive solution; data can be misleading in various arrangements.
  • Important to study the underlying situations and identify potential lurking variables.
  • Critical thinking is necessary to avoid being manipulated by data.

Conclusion

  • Statistics need careful interpretation to avoid manipulation.
  • Understanding the context and potential hidden factors is essential to accurately interpret data.