📊

Categorical Association (Chi-Square) Test

Aug 13, 2025

Overview

This lecture introduces the categorical association (chi-square) hypothesis test to evaluate the relationship between two categorical variables using contingency tables and discusses hypotheses, data requirements, assumptions, and test statistics.

Hypotheses for Categorical Association Test

  • The null hypothesis states that the categorical variables are not related (are independent).
  • The alternative hypothesis states that the categorical variables are related (are dependent).
  • Hypotheses may also be written as "not associated vs. associated" or "independent vs. dependent".

Data Requirements & Types

  • Requires two categorical data sets, collected from either one random sample (answering two questions) or multiple random samples (one question each).
  • If data comes from one random sample, it’s often called a test of independence.
  • If data comes from multiple random samples, it’s called a test of homogeneity.
  • Data is summarized in a contingency (two-way) table.

Test Statistic & Degrees of Freedom

  • The test uses the chi-square test statistic: sum of (observed count − expected count)² / expected count.
  • Degrees of freedom = (number of rows − 1) × (number of columns − 1).

Assumptions

  • Data must come from random samples.
  • Individuals within a sample must be independent; if multiple samples, independence is required within and between samples.
  • Expected counts in each cell of the contingency table must be at least 5.

Interpreting Observed vs. Expected Counts

  • Observed counts are the actual sample data in the table (excluding totals).
  • Expected counts are calculated based on the assumption that the null hypothesis is true.
  • Expected counts help determine if variables are related (if observed and expected differ significantly).

Example Overview

  • An experiment tested if listening to favorite, hated, or no music affects memorizing information, using three random samples (homogeneity test).
  • Data was summarized in a contingency table.

Key Terms & Definitions

  • Categorical association test — A test that determines if there is a relationship between two categorical variables.
  • Chi-square test statistic — A measure comparing observed and expected frequencies in categorical data.
  • Contingency table — A table displaying the frequency distribution of variables.
  • Degrees of freedom — Calculation based on number of rows and columns: (rows−1)×(columns−1).
  • Expected count — The frequency expected in a cell if variables are independent.

Action Items / Next Steps

  • Review how to calculate expected counts for contingency tables.
  • Practice identifying correct hypothesis wording for various association scenarios.
  • Complete any assigned readings or problems on categorical association tests.