📊

Categorical Data Description

Sep 8, 2025

Overview

This lecture covered how to describe categorical data using frequency and relative frequency tables, including manual methods and using Google Sheets.

Review of Previous Concepts

  • Statistics has two main branches: descriptive statistics and inferential statistics.
  • A population is the full set of items; a sample is a subset from the population.
  • Data discussed here is structured, arranged in tables (variables = columns, observations = rows).
  • Data types include categorical (labels or categories) and numerical (numbers).
  • Cross-sectional data is collected at one point in time; time series data is collected over time.
  • Measurement scales: categorical data (nominal, ordinal); numerical data (interval, ratio).

Describing Categorical Data

  • Categorical data is summarized using frequency distributions (tables showing category counts).
  • A frequency table lists each distinct category, tally marks for occurrences, and the frequency (count).
  • Steps to construct a manual frequency table:
    1. List distinct category values.
    2. For each observation, mark a tally for its category.
    3. Count tallies for each category to get frequencies.

Frequency Tables in Google Sheets

  • Enter the data in a column with a header (e.g., "Category").
  • Highlight the data and select "Data" > "Pivot Table."
  • In the pivot table editor, set the row as the category variable and values as the count.
  • The resulting table displays categories and their counts.

Relative Frequency

  • Relative frequency = frequency of category / total number of observations.
  • Add a column to divide each frequency by the total, yielding values between 0 and 1.
  • The sum of all relative frequencies equals 1.
  • Relative frequency tables allow comparison between datasets of different sizes.

Key Terms & Definitions

  • Descriptive statistics — summarizing and describing data features.
  • Inferential statistics — making predictions or inferences about a population from a sample.
  • Structured data — data in a tabular (row/column) format.
  • Categorical data — non-numeric data sorted by category (nominal or ordinal).
  • Frequency — count of occurrences for each category.
  • Frequency table — table showing each category and its frequency.
  • Relative frequency — proportion of each category relative to the total observations.

Action Items / Next Steps

  • Practice constructing manual frequency tables with sample categorical data.
  • Create frequency and relative frequency tables using Google Sheets.
  • Complete exercises calculating relative frequencies for given data sets.