Understanding P-Values and Simulations

Overview

This lecture explains how to calculate the p-value using traditional and simulation-based methods, focusing on randomized simulations to visualize sampling variability under the null hypothesis.

Definition and Calculation of P-Value

The p-value is the probability of obtaining the sample statistic or a more extreme result, assuming the null hypothesis is true.
Two main p-value calculation methods: traditional (theoretical) and modern (simulation/randomization).
Randomization approach uses computer simulations to generate sampling distributions under the null hypothesis.

Randomized Simulation Approach

The simulation assumes the null hypothesis is true and generates many random samples of the same size as the original sample.
Each simulated sample’s statistic (mean or proportion) is calculated to create a distribution centered on the null’s value.
The real sample statistic is compared to this simulated distribution to assess how extreme it is.

Example: Body Temperature (Mean)

Null hypothesis: mean = 98.6°F; alternative: mean < 98.6°F (left-tailed test).
Simulations are run assuming a population mean of 98.6°F, generating many sample means.
The p-value is calculated as the proportion of simulations with sample means less than or equal to the observed mean (e.g., 2 out of 250 samples yields a p-value of 0.008).

Example: Proportion Simulation (Two-Tailed Test)

Null hypothesis: proportion = 0.5; alternative: proportion ≠ 0.5.
The computer simulates sampling distributions with a center at 0.5.
The observed sample proportion is compared to both tails of the distribution.
The p-value is found by adding the proportions of samples as far or farther from the center as the observed statistic, in both directions (e.g., about 0.192 in each tail).

Key Terms & Definitions

P-value — Probability of obtaining a sample result as extreme as or more extreme than observed, if the null hypothesis is true.
Null hypothesis (H₀) — The assumption that there is no effect or difference; the baseline for testing.
Alternative hypothesis (H₁ or Ha) — The hypothesis suggesting an effect or difference exists.
Sampling distribution — The distribution of a statistic (mean, proportion) from many random samples under a specific hypothesis.
Left-tailed test — Test where the alternative hypothesis predicts lower values than the null (e.g., mean < null value).
Two-tailed test — Test where deviations in both directions from the null are considered (e.g., mean ≠ null value).

Action Items / Next Steps

Practice using simulation tools (e.g., StatKey) to generate sampling distributions under the null hypothesis.
Review how to identify and calculate p-values for left-, right-, and two-tailed tests.
Prepare for exercises involving both mean and proportion hypothesis tests.