📊

Categorical Data Description

Sep 8, 2025

Overview

This lecture covers how to describe categorical data using frequency and relative frequency tables, including how to construct these tables both manually and in Google Sheets.

Recap: Statistics Basics

  • Statistics has two main branches: descriptive statistics (summarizing data) and inferential statistics (drawing conclusions about a population from a sample).
  • A population is the whole group of interest; a sample is a subset used for analysis.
  • Data for this course focuses on structured, table-based data with rows as cases and columns as variables.
  • Data types: categorical (qualitative) and numerical (quantitative).
  • Categorical data can be nominal (no order) or ordinal (ordered); numerical data can be interval or ratio.
  • Data can be cross-sectional (single point in time) or time series (over time).

Describing Categorical Data

  • Categorical data are best summarized using frequency distributions (tables showing counts for each category).
  • Frequency is the count of each unique value (category) in the dataset.
  • To construct a frequency table manually: list distinct categories, tally occurrences, and count totals per category.
  • Examples showed how different data sets can have the same or different frequency distributions.

Creating Frequency Tables in Google Sheets

  • Enter data in a column, select the cells, use Data > Pivot Table to create the table.
  • In the Pivot Table Editor, set the category as 'Rows' and the counts as 'Values' to produce the frequency table.
  • This process can be used for any categorical variable, such as blood group or gender.

Relative Frequency

  • Relative frequency is the ratio of a category's frequency to the total number of observations.
  • Calculated by dividing each frequency by the dataset total; all relative frequencies sum to 1.
  • Relative frequencies allow comparison between datasets of different sizes.
  • Relative frequency tables are useful for standardized comparisons.

Key Terms & Definitions

  • Descriptive Statistics — methods for summarizing data.
  • Inferential Statistics — methods for making predictions or inferences about a population.
  • Structured Data — data organized in tables with rows and columns.
  • Categorical Data — data that can be sorted into groups, e.g., A, B, C, D.
  • Frequency — the count of observations in each category.
  • Frequency Table — a table listing each category and its frequency.
  • Relative Frequency — the proportion or percentage of the total for each category.

Action Items / Next Steps

  • Practice constructing frequency and relative frequency tables (manually and in Google Sheets) for given categorical data sets.
  • Prepare for next class: describing associations between two categorical variables.