Overview
This lecture introduces the categorical association (chi-square) hypothesis test to evaluate the relationship between two categorical variables using contingency tables and discusses hypotheses, data requirements, assumptions, and test statistics.
Hypotheses for Categorical Association Test
- The null hypothesis states that the categorical variables are not related (are independent).
- The alternative hypothesis states that the categorical variables are related (are dependent).
- Hypotheses may also be written as "not associated vs. associated" or "independent vs. dependent".
Data Requirements & Types
- Requires two categorical data sets, collected from either one random sample (answering two questions) or multiple random samples (one question each).
- If data comes from one random sample, it’s often called a test of independence.
- If data comes from multiple random samples, it’s called a test of homogeneity.
- Data is summarized in a contingency (two-way) table.
Test Statistic & Degrees of Freedom
- The test uses the chi-square test statistic: sum of (observed count − expected count)² / expected count.
- Degrees of freedom = (number of rows − 1) × (number of columns − 1).
Assumptions
- Data must come from random samples.
- Individuals within a sample must be independent; if multiple samples, independence is required within and between samples.
- Expected counts in each cell of the contingency table must be at least 5.
Interpreting Observed vs. Expected Counts
- Observed counts are the actual sample data in the table (excluding totals).
- Expected counts are calculated based on the assumption that the null hypothesis is true.
- Expected counts help determine if variables are related (if observed and expected differ significantly).
Example Overview
- An experiment tested if listening to favorite, hated, or no music affects memorizing information, using three random samples (homogeneity test).
- Data was summarized in a contingency table.
Key Terms & Definitions
- Categorical association test — A test that determines if there is a relationship between two categorical variables.
- Chi-square test statistic — A measure comparing observed and expected frequencies in categorical data.
- Contingency table — A table displaying the frequency distribution of variables.
- Degrees of freedom — Calculation based on number of rows and columns: (rows−1)×(columns−1).
- Expected count — The frequency expected in a cell if variables are independent.
Action Items / Next Steps
- Review how to calculate expected counts for contingency tables.
- Practice identifying correct hypothesis wording for various association scenarios.
- Complete any assigned readings or problems on categorical association tests.