Introduction to Statistics

Jul 23, 2024

Lecture: Introduction to Statistics by Justin Zeltzer

Overview

  • Goal: Explain statistics in under half an hour without math
  • Audience: Beginners in statistics; those interested in basic concepts
  • Theme: Examples from the NBA basketball

Types of Data

Categorical Data

  • Nominal: No order
    • Example: Teams Steph Curry can play for
  • Ordinal: Ordered categories
    • Example: Positions in basketball (Guard, Forward, Center)

Numerical Data

  • Discrete: Finite numbers
    • Example: Number of free throws missed
  • Continuous: Infinite subdivisions
    • Example: Player’s height (e.g., 191.3 cm)

Proportions

  • Proportions as Numerical Data:
    • Example: Steph Curry’s three-point percentage
    • Percentages are proportions out of 100
    • Built from nominal data (made/missed shots)
    • Discussion on whether proportions are discrete or continuous

Distributions

Probability Density Function (PDF)

  • Example: Height distribution of NBA players (Isaiah Thomas 5’9”, Boban Marjanovic 7’3”)
  • Normal Distribution (Bell Curve): Most players around average height
  • Other Types:
    • Uniform Distribution: Equal probability across all values
    • Bimodal Distribution: Two peaks
    • Skewed Distribution: Long tail in one direction (left or right skew)

Sampling Distributions

  • Concerned with the distribution of sample means rather than individual measurements
  • Law of Large Numbers: Larger sample sizes yield average values closer to the population mean, reducing variance

Estimation and Inference

  • Sample Statistic vs Parameter: Sample provides an estimate, not the precise population value
    • Example: Steph Curry’s three-point shooting percentage (Theta, Θ)
  • Confidence Intervals: Provide a range where the true parameter likely falls
    • Individual player’s confidence intervals (e.g., Meyers Leonard vs Steph Curry)

Parameters and Statistics

  • Parameters (Greek Letters):
    • Mu (μ): Mean
    • Sigma (σ): Standard deviation
    • Pi (π): Proportion
    • Rho (ρ): Correlation
    • Beta (β): Gradient in regression
  • Sample Statistics (Roman letters):
    • X-bar (x̄): Sample mean
    • S: Sample standard deviation
    • P: Sample proportion
    • R: Sample correlation
    • B: Sample gradient

Hypothesis Testing

  • Null Hypothesis (H₀): Assumes no effect or status quo
  • Alternative Hypothesis (H₁): What we seek evidence for (e.g., player shooting above 50%)
  • Example: Meyers Leonard’s three-point shooting
  • Rejection Region: Area where sample is extreme enough to reject H₀
  • Significance Level (α): Often set at 0.05 (5%)

Key Points on Hypothesis Testing

  • Never say “prove” or “accept” in conclusions
  • Conclusions vary on evidence to reject or not reject H₀
  • Analogy with judicial system: not enough evidence ≠ innocence

p-Values

  • Definition: Measures the extremity of the sample statistic
  • Interpretation: Smaller p-value means more extreme and more evidence against H₀
  • Relation to rejection region and significance level (α)

Ethical Considerations in Research

P-Hacking

  • Definition: Manipulating data collection or analysis to achieve desired p-values
  • Good Research: Theorize, collect, test targeted effect
  • Bad Research: Collect data broadly, test many effects, report significant ones
  • Problem: Increases likelihood of finding false positives

Additional Resources

  • Further learning available at zstatistics.com

Conclusion

  • Summary of key concepts with NBA examples
  • Aim to provide intuition and understanding of statistical principles