Overview
This lecture explains the concept of bootstrapping to create confidence intervals and compares it to traditional sampling distributions and formula-based approaches.
Bootstrapping and Sampling Distributions
- Bootstrapping simulates the sampling distribution by repeatedly sampling from the original sample.
- A real sampling distribution involves taking many random samples from the population.
- In bootstrapping, the center of the distribution is the sample statistic, not the population parameter.
- The shape and standard error of a bootstrap distribution often resemble those of a real sampling distribution.
- Each bootstrap gives slightly different results because each set of random samples differs.
Creating Confidence Intervals with Bootstrapping
- Bootstrapping allows for direct calculation of confidence intervals from the simulated distribution.
- A 95% confidence interval is set by taking the middle 95% of the bootstrap results.
- For the given data, the 95% confidence interval for the population mean is 98.053 to 98.480 degrees Fahrenheit.
- Results may vary slightly with each bootstrap due to randomness in resampling.
Comparison: Bootstrapping vs. Traditional Formulas
- StatKey uses bootstrapping while StatKado uses traditional normal formulas for confidence intervals.
- Bootstrapping does not require normality or formula assumptions—it works directly from the sample data.
- Bootstrapping can be used for parameters like the median and standard deviation, which are harder to handle with formulas.
Bootstrapping for Other Statistics
- You can bootstrap the median to create a confidence interval for the population median.
- You can also bootstrap the standard deviation, which can be challenging with traditional formulas.
- Example: The 95% confidence interval for standard deviation of body temperature is between 0.572 and 0.953 degrees Fahrenheit.
Key Terms & Definitions
- Bootstrapping — A method of creating sampling distributions by repeatedly sampling with replacement from a single sample.
- Sampling Distribution — The probability distribution of a statistic based on all possible random samples from a population.
- Confidence Interval — A range of values derived from the data that likely contains the population parameter.
- Sample Statistic — A numerical summary (like mean or median) calculated from the sample data.
Action Items / Next Steps
- Practice using bootstrapping to calculate confidence intervals for mean, median, and standard deviation.
- Explore StatKey or similar tools to perform your own bootstrap analyses.
- Review the differences between formula-based and bootstrap confidence intervals.