📊

Understanding t-Distribution and Inference Techniques

May 7, 2025

Unit 7: Inference for Quantitative Data - Means

The t-Distribution

  • Used when the population standard deviation is unknown, estimate using sample standard deviation (s).
  • Introduced in 1908 by W. S. Gosset.
  • For a normally distributed population, the t-distribution is bell-shaped and symmetric but more spread out than the normal distribution.
  • Dependent on degrees of freedom (df), calculated as sample size minus one.
  • Proper choice whenever population standard deviation is unknown, which is most real-world cases.

Confidence Interval for a Mean

  • If sample size (n) is large:
    • Sample means are normally distributed.
    • Mean of sample means equals population mean (μ).
    • Standard deviation of sample means equals population standard deviation divided by square root of sample size.
  • Use sample standard deviation when population standard deviation is unknown; called the standard error.
  • Example 7.1: Gas mileage study with confidence intervals and significance testing.

Significance Test for a Mean

  • Requires:
    • Simple random sample.
    • Sample size < 10% of population.
    • Large enough sample size for CLT or approximately normal distribution.
  • Example 7.2: Testing manufacturer's claim about electricity usage with significance levels.

Confidence Interval for the Difference of Two Means

  • Sampling distribution of differences of sample means is normally distributed.
  • Use a t-distribution if population standard deviations are unknown.
  • Example 7.3: Comparing accident rates between two departments with confidence intervals.

Significance Test for the Difference of Two Means

  • Requires two independent simple random samples and either normally distributed populations or large enough sample sizes.
  • Null hypothesis usually assumes equal means.
  • Example 7.4: Comparing computer downtime between two companies with type I and type II errors.

Paired Data

  • Involves one-sample analysis on differences from paired data.
  • Example 7.5: SAT score improvements with confidence intervals.

Simulations and P-Values

  • Simulations estimate P-values by modeling test statistic distributions under null hypothesis.
  • Example 7.6: Machinery recalibration based on median absolute deviation calculations.

More on Power, Type I Errors, and Type II Errors

  • Type II error: Failing to reject a false null hypothesis.
  • Power: Probability of correctly rejecting a false null hypothesis.
  • Example 7.7: Testing political support claims with power analysis.

Confidence Intervals Versus Hypothesis Tests

  • Hypothesis tests assess claims about parameters; confidence intervals estimate parameters.
  • Example 7.8: Comparing basketball player shooting accuracy with hypothesis tests and confidence intervals. Both decisions consistent across sample sizes.