📊

Understanding Chi-Square Tests

May 8, 2025

Unit 8: Inference for Categorical Data: Chi-Square

Chi-Square Test for Goodness-of-Fit

  • Goodness-of-Fit: Evaluate the fit of a theoretical distribution to observed data.
  • Null Hypothesis: The given theoretical distribution correctly describes the situation.
  • Chi-Square Statistic (𝜒²): Measures the sum of weighted differences between observed and expected frequencies.
    • The smaller the 𝜒²-value, the better the fit.
    • P-value: Probability of obtaining a 𝜒²-value as extreme as the one observed if the null hypothesis is true.
    • A large 𝜒²-value indicates a poor fit and leads to rejecting the null hypothesis.
  • Chi-Square Distribution:
    • Nonnegative values, not symmetric, skewed to the right.
    • Different distributions for different degrees of freedom (df).
    • Larger df results in a distribution closer to normal.

Example 8.1

  • Scenario: Distribution of liquor stores across four city regions.
  • Hypotheses:
    • H₀: Liquor stores are distributed according to region area percentages.
    • Hₐ: Distribution differs from region area percentages.
  • Procedure: Chi-square test for Goodness-of-fit.
  • Checks:
    • Random sample and expected values > 5.
    • Sample size < 10% of population.
  • Calculation: 𝜒² = 6.264, P = 0.099.
  • Conclusion: P > 0.05, insufficient evidence to reject H₀.

Chi-Square Test for Independence

  • Test of Independence: Determine if there is a significant association between two categorical variables.
  • Null Hypothesis: The two classifications are independent.
  • Degrees of Freedom (df): Calculated as (number of rows - 1) * (number of columns - 1).
  • Limitations: Even if independence is rejected, causality cannot be assumed.*

Example 8.2

  • Scenario: Relationship between party affiliation and support for marijuana legalization.
  • Hypotheses:
    • H₀: Party affiliation and support are independent.
    • Hₐ: Party affiliation and support are not independent.
  • Procedure: Chi-square test for independence.
  • Checks:
    • Random sample, n < 10% of the population, expected cells > 5.
  • Calculation: 𝜒² = 94.5, P = 0.000.
  • Conclusion: P < 0.05, sufficient evidence to reject H₀.

Chi-Square Test for Homogeneity

  • Test for Homogeneity: Compare samples from two or more populations regarding a single categorical variable.
  • Conditions:
    • Samples must be simple random samples.
    • Samples should be independent.
    • Large populations relative to sample sizes.
    • Expected values > 5.

Example 8.3

  • Scenario: Job satisfaction among school employees.
  • Hypotheses:
    • H₀: Job satisfaction is the same across job categories.
    • Hₐ: At least two categories differ in job satisfaction.
  • Procedure: Chi-square test for homogeneity.
  • Checks:
    • Independent random samples, expected cells > 5, samples < 10% of populations.
  • Calculation: 𝜒² = 8.707, P = 0.0335.
  • Conclusion: P < 0.05, sufficient evidence to reject H₀; job satisfaction differs across categories.