📊

Inference for Categorical Data Concepts

May 7, 2025

Unit 6: Inference for Categorical Data: Proportions

Confidence Intervals

  • Confidence Interval Meaning

    • Percentage reflects samples pinpointing unknown p within margin of error.
    • Probability of p being within the interval is 1 or 0.
  • Two Key Aspects

    • Confidence interval: often expressed via a mathematical form.
    • Success rate of the method (Confidence level): how often repeated methods capture true parameter.
    • Standard Error: indicates typical variance of sample statistic from population parameter.
    • Margin of Error: multiple of standard error to denote confidence level.
  • Conditions for Inference

    • Independence Assumption: Individuals must be independent; achieved through random sampling.
      • Sample size should not exceed 10% of the population.
    • Normality Assumption:
      • Proportions: modeled as normal if np and nq ≥ 10.
      • Means: normal model applies if population is normal or sample size is large (n ≥ 30).

Example 6.1

  • Choosing a value for population proportion p that satisfies normal model conditions.

Confidence Interval for a Proportion

  • Estimation of population proportion p using sample proportion.

  • **Properties: **

    1. Sample proportions approximate a normal distribution.
    2. Mean of sample proportions equals population proportion.
    3. Standard deviation of sample proportions ≈ formula using p.
  • Example 6.2: Calculating a 99% confidence interval for survey data.

Significance Testing

  • Process

    • Test hypothesis (null hypothesis) with sample data.
    • Reject null hypothesis if sample statistic significantly deviates.
    • Types of errors:
      1. Type I Error: Rejecting a true null hypothesis.
      2. Type II Error: Failing to reject a false null hypothesis.
  • Type I vs Type II Errors

    • ⍺-Risk: Probability of Type I error.
    • β-Risk: Probability of Type II error (1 - β is the power of the test).

Example 6.3

  • Hypothesis test to determine if union support claim is valid.

Confidence Interval for Difference of Two Proportions

  • Describes distribution and comparison of differences of sample proportions.
  • Example 6.4: Confidence interval to compare job satisfaction between different shifts.

Significance Test for the Difference of Two Proportions

  • Null hypothesis typically states equality of population proportions.
  • Example 6.5: Testing whether a higher proportion of First Nations children are in child welfare care compared to non-Aboriginal children.

These notes encapsulate the key principles and examples from the lecture on inference for categorical data, highlighting confidence intervals, significance testing, and both errors related to hypothesis testing. Examples provide practical applications to reinforce theoretical concepts.