📊

Statistics Overview and Key Concepts

Sep 7, 2025

Types of Data

  • Data is divided into categorical and numerical types.
  • Categorical data: Nominal (no order, e.g., team names) and ordinal (ordered, e.g., player positions).
  • Numerical data: Discrete (countable, e.g., missed free throws) and continuous (measurable, e.g., height).
  • Proportions summarize nominal data into numerical form (e.g., 3-point shooting percentage).

Probability Distributions

  • A probability distribution shows how values of a variable are spread across all possible outcomes.
  • Common distributions: Normal (bell curve), uniform (equal probability), bimodal (two peaks), skewed (long tail).
  • Most real-world data (like NBA heights) follow a normal distribution.

Sampling and Estimation

  • A sample is a subset of data from a larger population.
  • Sample statistics (like sample mean or proportion) estimate population parameters (unknown true values like long-term shooting percentage).
  • Larger sample sizes yield more precise (less variable) estimates.
  • Confidence intervals indicate the range where the true parameter likely lies (e.g., 95% confidence interval).

Parameters and Statistics

  • Population parameters use Greek symbols: μ (mean), σ (standard deviation), π or θ (proportion), ρ (correlation), β (regression slope).
  • Sample statistics use Roman letters: x̄ (mean), s (standard deviation), p (proportion), r (correlation), b (slope).

Hypothesis Testing

  • Hypothesis tests assess evidence for or against a claim about a population parameter.
  • Null hypothesis (H₀): default assumption (e.g., θ ≤ 0.5).
  • Alternate hypothesis (H₁): what you seek evidence for (e.g., θ > 0.5).
  • The result is either to “reject” or “not reject” the null hypothesis—never to “prove” or “accept” anything.
  • The level of significance (often 0.05) defines the rejection region (when evidence is strong enough to reject H₀).

P-Values

  • P-value measures how extreme the observed sample is under the null hypothesis.
  • If p-value < significance level (e.g., < 0.05), reject the null hypothesis.
  • Small p-value = strong evidence against H₀; large p-value = weak evidence.

P-Hacking and Research Validity

  • P-hacking: testing many hypotheses and only reporting those with p < 0.05, inflating false-positive findings.
  • Proper research tests predefined hypotheses; improper research looks for any significant finding post hoc.
  • Multiple testing increases the chance of finding a false “significant” result.

Key Terms & Definitions

  • Categorical Data — data sorted into groups or categories.
  • Numerical Data — data measured as numbers, either discrete or continuous.
  • Proportion — a summary metric showing the fraction of occurrences.
  • Population Parameter — an unknown, fixed value describing the whole population.
  • Sample Statistic — a value calculated from a sample used to estimate a parameter.
  • Hypothesis Test — a procedure to evaluate evidence about a population.
  • Null Hypothesis (H₀) — the assumption that there is no effect or difference.
  • Alternate Hypothesis (H₁) — what we seek evidence for in a test.
  • P-Value — the probability of observing the sample statistic if the null hypothesis is true.
  • Confidence Interval — the range where a parameter likely lies, given the data.
  • P-Hacking — misuse of statistical testing by multiple unplanned comparisons.

Action Items / Next Steps

  • Review the definitions of data types, parameters, and hypothesis test procedures.
  • Consider further readings or videos on the normal distribution, regression, and standard deviation for deeper understanding.
  • Be aware of p-hacking and the importance of predefining hypotheses in research.