Bootstrapping and Confidence Intervals

Overview

This lecture explains the concept of bootstrapping to create confidence intervals and compares it to traditional sampling distributions and formula-based approaches.

Bootstrapping and Sampling Distributions

Bootstrapping simulates the sampling distribution by repeatedly sampling from the original sample.
A real sampling distribution involves taking many random samples from the population.
In bootstrapping, the center of the distribution is the sample statistic, not the population parameter.
The shape and standard error of a bootstrap distribution often resemble those of a real sampling distribution.
Each bootstrap gives slightly different results because each set of random samples differs.

Creating Confidence Intervals with Bootstrapping

Bootstrapping allows for direct calculation of confidence intervals from the simulated distribution.
A 95% confidence interval is set by taking the middle 95% of the bootstrap results.
For the given data, the 95% confidence interval for the population mean is 98.053 to 98.480 degrees Fahrenheit.
Results may vary slightly with each bootstrap due to randomness in resampling.

Comparison: Bootstrapping vs. Traditional Formulas

StatKey uses bootstrapping while StatKado uses traditional normal formulas for confidence intervals.
Bootstrapping does not require normality or formula assumptions—it works directly from the sample data.
Bootstrapping can be used for parameters like the median and standard deviation, which are harder to handle with formulas.

Bootstrapping for Other Statistics

You can bootstrap the median to create a confidence interval for the population median.
You can also bootstrap the standard deviation, which can be challenging with traditional formulas.
Example: The 95% confidence interval for standard deviation of body temperature is between 0.572 and 0.953 degrees Fahrenheit.

Key Terms & Definitions

Bootstrapping — A method of creating sampling distributions by repeatedly sampling with replacement from a single sample.
Sampling Distribution — The probability distribution of a statistic based on all possible random samples from a population.
Confidence Interval — A range of values derived from the data that likely contains the population parameter.
Sample Statistic — A numerical summary (like mean or median) calculated from the sample data.

Action Items / Next Steps

Practice using bootstrapping to calculate confidence intervals for mean, median, and standard deviation.
Explore StatKey or similar tools to perform your own bootstrap analyses.
Review the differences between formula-based and bootstrap confidence intervals.