📊

Categorical Data Analysis in Admissions

Aug 26, 2025

Overview

This lecture explores UC Berkeley's 1973 graduate admissions data, focusing on gender differences using categorical data analysis methods to examine potential bias.

Introduction to the Berkeley Admissions Study

  • Researchers analyzed 1973 admissions to Berkeley’s six largest graduate departments for gender-related trends.
  • Data from that era only included binary gender categories (male and female).
  • Only 32% of admitted students were women, raising questions about possible admissions bias.

Organizing and Summarizing Categorical Data

  • Admissions data is displayed in a two-way table, categorizing by gender and admissions decision (admitted/rejected).
  • Row and column totals, plus a grand total, summarize the distribution.
  • A marginal distribution examines the distribution of a single variable, such as the proportion of applicants by gender.

Marginal Distributions: Gender Composition of Applicants

  • 59.5% of applicants were male; 40.5% were female.
  • More men applied than women, but this alone doesn't fully explain the gap in admitted students.

Visualizing with Graphs

  • Side-by-side bar graphs compare applicant and admitted student gender proportions.
  • Effective graphs require titles, axis ticks, labels, and keys for clarity and AP exam credit.

Conditional Distributions: Admission Rates by Gender

  • Admission rate for males: 44.5%; for females: 30.4%.
  • Conditional distributions focus on one subgroup at a time (e.g., among male applicants, what percent were admitted/rejected?).
  • Segmented bar charts show proportions admitted/rejected within each gender group.

Interpreting Associations Between Gender and Admission

  • Data shows women are admitted at lower rates compared to men.
  • A claim of association requires: stating the association, supporting it with comparative percentages, and giving context (mentioning variables).
  • Use comparative language like "higher" and "lower" when explaining associations.

Department-Level Analysis

  • Departments are categorized as not selective (>50% admitted), selective (26-50% admitted), or very selective (1-25% admitted).
  • Within departments, admission rates for women are higher or similar to men.
  • Despite this, the overall female admit rate is lower, presenting a statistical paradox.

Discussion and Mosaic Plots

  • Mosaic plots visualize segmented bar data, adjusting width for applicant numbers in each category.
  • Discussion question: How can women have higher admit rates within departments but a lower overall admit rate?

Key Terms & Definitions

  • Two-way table — A table categorizing data by two variables.
  • Marginal distribution — The distribution of a single variable in a two-way table.
  • Conditional distribution — The breakdown of one variable for a specific subgroup of another variable.
  • Segmented bar chart — A bar graph showing subcategory proportions within a category.
  • Mosaic plot — A segmented bar chart where bar width represents sample size.

Action Items / Next Steps

  • Consider: Why do overall admit rates for women differ from within-department rates? Prepare reasoning for class discussion.
  • Review today’s graphs and summary methods for describing categorical data.
  • (Optional) Print or complete guided notes available online.