📊

Overview of Chi-squared Tests

May 5, 2025

Chi-squared Test Overview

Introduction

  • Chi-squared test: A statistical hypothesis test used to analyze contingency tables with large sample sizes.
  • Examines the independence of two categorical variables by analyzing test statistic values.

Types of Chi-squared Tests

  • Pearson's chi-squared test: Determines statistical significance between expected and observed frequencies in contingency tables.
  • Fisher's exact test: Used for contingency tables with small sample sizes.

Applications

  • Observations are classified into mutually exclusive classes.
  • Under the null hypothesis of no differences, the test statistic follows a chi-squared distribution.
  • Used to test independence of random variables.

Mathematical History

  • Developed in the late 19th and early 20th century by Karl Pearson.
  • Pearson introduced the Pearson distribution to model various data distributions, challenging normal distribution assumptions.

Pearson's Chi-squared Test

  • Published by Karl Pearson in 1900, it forms the basis of modern statistics.
  • Involves classifying n observations into k classes with expected numbers mi = npi.
  • Test statistic (X²) follows a chi-squared distribution with k - 1 degrees of freedom in large samples.

Additional Chi-squared Tests

  • Test for variance: Tests if the variance of a normally distributed population has a specific value.
  • Other Tests:
    • Cochran-Mantel-Haenszel test
    • McNemar's test
    • Tukey's test of additivity
    • Portmanteau test in time-series
    • Likelihood-ratio tests

Yates's Correction for Continuity

  • Suggested by Frank Yates to reduce error in chi-squared approximations.
  • Adjusts Pearson’s chi-squared test by subtracting 0.5 from the observed vs. expected value difference in 2x2 tables.

Chi-squared Test for Variance in a Normal Population

  • Applicable when sample from a normal distribution tests if population variance holds a pre-determined value.

Applied Example

  • Example given of city residents and occupational categories to test the independence of variables.
  • The test statistic is calculated, with rejection of the null hypothesis if statistic is improbably large.

Uses in Various Fields

  • Cryptanalysis: Used to compare plaintext and ciphertext distributions.
  • Bioinformatics: Compares properties of genes across different categories.