Coconote
AI notes
AI voice & video notes
Export note
Try for free
SWiRL: learning statistics module
Oct 15, 2024
Introduction to Statistics Lecture Notes
Introduction
Presenter: Justin Zeltser from zstatistics.com
Challenge: Explain statistics in under half an hour without maths.
Goal: Develop intuition about statistics for beginners or the curious.
Examples themed around the NBA.
Types of Data
Categorical Data
Nominal:
No order (e.g., NBA teams)
Ordinal:
Some order (e.g., player positions)
Numerical Data
Discrete:
Countable values (e.g., number of free throws missed)
Continuous:
Any value within a range (e.g., player's height)
Proportions:
Aggregates nominal data into numerical summaries.
Distributions
Probability Density Function
: Describes distribution of a population.
Normal Distribution
: Commonly occurs; mean-centered with rarer extremes.
Other Distributions
Uniform Distribution
: Equal probability across values.
Bimodal Distribution
: Two peaks in data.
Skewed Distribution
: Longer tail on one side (left or right skew).
Sampling Distributions
Sampling
: Selecting a group to infer about a population.
Distribution of Sample Means
: Skinnier than population distribution.
Variance Reduction
: Larger samples yield less variance in sample means.
Estimation
Sample Statistic
: An estimate of a population parameter.
Theta (θ)
: Symbol for an unknown parameter.
Confidence Intervals
: Indicate range where a parameter likely falls.
Example
: Comparing Steph Curry and Myers Leonard's three-point shooting.
Parameters and Estimation
Common Parameters
Mu (μ)
: Mean
Sigma (σ)
: Standard deviation
Pi (π)
: Proportion
Rho (ρ)
: Correlation
Beta (β)
: Gradient (regression)
Sample Statistics
: Estimates for parameters (e.g., x-bar for mean).
Hypothesis Testing
Hypothesis Test
: Tests evidence against a null hypothesis (H0).
Alternate Hypothesis
: What you seek evidence for.
Rejection Region
: Area beyond which null hypothesis is rejected.
Significance Level
: Commonly set at 5%.
P-Values
Definition
: Measures how extreme a sample is.
Interpretation
: Lower p-value indicates more evidence against null hypothesis.
Relation to Hypothesis Tests
: If p-value < 0.05, likely in rejection region.
P-Hacking
Definition
: Manipulating data to find significant results.
Problem in Research
: Testing multiple hypotheses increases false positives.
Good vs. Bad Research
: Good research tests a single hypothesis; bad research tests many and reports only significant findings.
Conclusion
Statistics deals with uncertainty and inference rather than proof.
Importance of understanding potential for misuse in research (p-hacking).
Additional resources and in-depth discussions available on zstatistics.com.
📄
Full transcript