📊

Understanding Sampling Distributions and Inference

Oct 5, 2024

Lecture Notes: Sampling Distributions and Statistical Inference

Announcements

  • Next course project posted on Canvas under the module section.
  • Focus: One-sample T-test and QQ plot.
  • Includes tasks like making histograms, calculating basic statistics.

Sampling Distributions

  • Concept of Sampling Distribution:

    • Derived from taking multiple samples from a population and calculating the mean of each sample.
    • If you repeatedly collect samples and calculate their means, a distribution of these means forms (sampling distribution of the mean).
    • This distribution tends to be normal even if the population distribution is skewed.
  • Example in R:

    • Creation of a skewed population.
    • Sampling of size 10 repeatedly to show the progression of the sampling distribution towards normality.
    • As sample size increases, the distribution of sample means becomes more normal.

Central Limit Theorem (CLT)

  • Definition:

    • States that the sampling distribution of any statistic becomes normal as the sample size increases.
  • Criteria for CLT to Hold:

    • Samples must be independent.
    • Observations should be randomly collected.

Importance of Sampling Distribution

  • Mean of Sampling Distribution vs Population Mean:

    • The mean of the sampling distribution equals the population mean.
    • The sample mean is an unbiased estimate of the population mean.
  • Standard Error:

    • Defined as the standard deviation of the sampling distribution.
    • Indicates natural fluctuations or error from sample to sample.

Standard Error and Confidence Intervals

  • Standard Error (SE) Calculation:

    • Formula: SE = Population Standard Deviation / √Sample Size
    • In practice, use sample standard deviation since population standard deviation is often unknown.
  • Confidence Interval (CI):

    • A range of values likely to contain the true population mean.
    • Calculated using the standard error.
  • Implications of Sample Size:

    • Larger sample sizes reduce standard error and result in narrower confidence intervals (more precise estimates).

Challenges in Practical Research

  • In real-world scenarios, researchers often have only one sample.
  • True population standard deviation is usually unknown.
  • Solution: Use sample standard deviation to estimate population standard deviation.

T-Distribution

  • Introduction to T-Distribution:

    • Used when sample sizes are small (less than 1000).
    • Heavier tails than normal distribution.
    • Developed for estimating population parameters when sample sizes are small.
  • Example Using T-Distribution:

    • Scenario: Researcher claims college students spend 9 hours a day on phones.
    • Small sample size collected, leads to calculation of sample mean and standard deviation.
    • Use T-distribution to estimate confidence interval and evaluate the claim.
  • Historical Context:

    • Developed by William Gosset at Guinness Brewery under pseudonym "Student".
    • Addressed issues with small sample sizes using a normal distribution.

Summary

  • Recap:

    • Importance of sampling distributions in statistical inference.
    • Use of central limit theorem to justify normality assumption.
    • Variability in sample estimates captured by standard error.
    • Confidence intervals provide a range for the likely population mean.
  • Transition to T-Distribution:

    • Solution to problems with small sample sizes by using T-distribution as a model.