Coconote
AI notes
AI voice & video notes
Try for free
📊
Statistics Overview and Key Concepts
Sep 7, 2025
Types of Data
Data is divided into categorical and numerical types.
Categorical data: Nominal (no order, e.g., team names) and ordinal (ordered, e.g., player positions).
Numerical data: Discrete (countable, e.g., missed free throws) and continuous (measurable, e.g., height).
Proportions summarize nominal data into numerical form (e.g., 3-point shooting percentage).
Probability Distributions
A probability distribution shows how values of a variable are spread across all possible outcomes.
Common distributions: Normal (bell curve), uniform (equal probability), bimodal (two peaks), skewed (long tail).
Most real-world data (like NBA heights) follow a normal distribution.
Sampling and Estimation
A sample is a subset of data from a larger population.
Sample statistics (like sample mean or proportion) estimate population parameters (unknown true values like long-term shooting percentage).
Larger sample sizes yield more precise (less variable) estimates.
Confidence intervals indicate the range where the true parameter likely lies (e.g., 95% confidence interval).
Parameters and Statistics
Population parameters use Greek symbols: μ (mean), σ (standard deviation), π or θ (proportion), ρ (correlation), β (regression slope).
Sample statistics use Roman letters: x̄ (mean), s (standard deviation), p (proportion), r (correlation), b (slope).
Hypothesis Testing
Hypothesis tests assess evidence for or against a claim about a population parameter.
Null hypothesis (H₀): default assumption (e.g., θ ≤ 0.5).
Alternate hypothesis (H₁): what you seek evidence for (e.g., θ > 0.5).
The result is either to “reject” or “not reject” the null hypothesis—never to “prove” or “accept” anything.
The level of significance (often 0.05) defines the rejection region (when evidence is strong enough to reject H₀).
P-Values
P-value measures how extreme the observed sample is under the null hypothesis.
If p-value < significance level (e.g., < 0.05), reject the null hypothesis.
Small p-value = strong evidence against H₀; large p-value = weak evidence.
P-Hacking and Research Validity
P-hacking: testing many hypotheses and only reporting those with p < 0.05, inflating false-positive findings.
Proper research tests predefined hypotheses; improper research looks for any significant finding post hoc.
Multiple testing increases the chance of finding a false “significant” result.
Key Terms & Definitions
Categorical Data
— data sorted into groups or categories.
Numerical Data
— data measured as numbers, either discrete or continuous.
Proportion
— a summary metric showing the fraction of occurrences.
Population Parameter
— an unknown, fixed value describing the whole population.
Sample Statistic
— a value calculated from a sample used to estimate a parameter.
Hypothesis Test
— a procedure to evaluate evidence about a population.
Null Hypothesis (H₀)
— the assumption that there is no effect or difference.
Alternate Hypothesis (H₁)
— what we seek evidence for in a test.
P-Value
— the probability of observing the sample statistic if the null hypothesis is true.
Confidence Interval
— the range where a parameter likely lies, given the data.
P-Hacking
— misuse of statistical testing by multiple unplanned comparisons.
Action Items / Next Steps
Review the definitions of data types, parameters, and hypothesis test procedures.
Consider further readings or videos on the normal distribution, regression, and standard deviation for deeper understanding.
Be aware of p-hacking and the importance of predefining hypotheses in research.
📄
Full transcript